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In  the  VLSI  aioroeleetronics  era,  the  cost  of  the  iaaense  CPU 
tine  and  aeaory  storage  for  a  'Standard"  circuit  siaolator  has  beeoae 
prohibitive.  In  order  to  achieve  draaatic  iaproveaent  in  the  perfor- 
aance  of  the  circuit  simulator*  there  are  two  principal  points  of 
departure  fron  the  ' Standard '  simulation  approach*  namely,  'tearing' 
decoaposition  and  'fcelaxatioaf'  decomposition.^ 

This  research  is  to  study  the  numerical  convergence  and  stabil¬ 
ity  properties  of  several  of  the  relaxation  algorithms  that  have  been 
proposed  for  the  siaulation  of  VLSI  circuits.  The  time-point  Gauss- 
Seidel  method  with  prediction,  the  exploitation  of  latency  and  event 
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CHAPTER  1 


INTRODUCTION 


Circuit  simulation  lias  becosie  a  significant  tool  in  the  design 
of  integrated  circuits.  ’Standard’  circuit  simulators,  such  as 
SPICE2  [1]  and  ASTAP  [2],  substantially  include  the  following  four 
algorithmic  techniques  : 

1)  Stiffly  stable  implicit  integration  methods,  such  as  the 
backward  Euler  or  Trapezoidal  formulas,  which  replace  the  original 
system  of  nonlinear  differential-algebraic  equations,  describing  the 
behavior  of  the  circuit,  into  a  system  of  nonlinear  algebraic  equa¬ 
tions. 

2)  Automatic  control  of  the  time  step  h  by  using  approxima¬ 
tion  differentiation  to  estimate  the  local  truncation  error  so  as  to 
ensure  accuracy  of  the  solution. 

3)  The  quadratically  convergent  Newton-Raphson  method  to 
solve  the  system  of  the  nonlinear  algebraic  equations  by  iteratively 
solving  a  sequence  of  linear  equations. 

4)  Sparse  Gaussian  elimination  methods  to  solve  the  linear 
algebraic  equations  in  each  Newton-Raphson  iteration. 


Because  of  the  good  numerical  stability  and  accuracy  properties 
of  the  implicit  integration  nethods,  these  circuit  simulators  can 
handle  a  vide  variety  of  ordinary  differential  equations  very  well. 
Also  vith  the  variable  tine  step  size  control  technique,  these 
methods  have  a  simulation  speed  advantage  in  stiff  systems  (widely 
separated  eigenvalues),  because  one  can  vary  the  integration  step 
size  according  to  the  rate  of  change  of  the  response  and  not 
encounter  numerical  stability  problems.  For  nonlinear  equations, 
Newton-Raphson  iteration  can  be  used  to  achieve  convergence  over  a 
wide  range  of  integration  step  sizes.  Finally,  modified  nodal 
methods  are  used  to  formulate  the  circuit  equations  because  they  are 
efficient  and  result  in  sparse  arrays.  Thus,  in  order  to  minimize  the 
number  of  numerical  operations,  these  equations  are  solved  by  means 
of  sparse  matrix  techniques. 

Although  these  circuit  simulators  are  very  efficient,  their 
memory  and  CHI  requirements  typically  limit  their  use  to  a  few  hun¬ 
dred  transistors.  However,  with  the  rapid  growth  in  the  scale,  meas¬ 
ured  in  device  count,  of  integrated  circuits  being  designed  in  the 
VLSI  microelectronics  era,  the  cost  of  the  immense  CPU  time  and 
memory  storage  for  'Standard? '  circuit  simulators  has  become  prohibi¬ 
tive.  In  order  to  achieve  even  faster  circuit  simulation  with  less 
memory  requirements,  one  must  take  advantage  of  some  of  the  special 
features  of  these  circuits,  such  as  (a)  the  repetitiveness  of  the 
circuit  (many  gates  are  of  the  same  type),  (b)  the  different  levels 
of  activity  of  the  various  gates  in  a  given  time  interval,  and  (c) 


the  el no st  one-way  propagation  of  the  signal  in  many  gates.  Henee  a 
series  of  new  generation  methods  of  circuit  simulation,  such  as  MOTTS 
[3],  MOTTS- C  [4],  DIANA  [5],  SPLICE  [6]  and  MACRO  [7]  have  been 
developed.  These  new  sianlators,  in  their  quest  for  speed,  elim¬ 
inated  one  or  aore  of  the  principal  features  of  the  *btaadard* '  simu¬ 
lator  in  order  to  make  a  favorable  trade-off  between  cost  (i.e.,  the 
CPU  tiae  and  aeaory  storage)  and  resolution  (i.e.,  the  attainable 
level  of  detail).  The  progrsa  SLATE  [8]  takes  advantage  of  items  (a) 
and  (b)  above  by  using  node  tearing  to  partition  the  circuit  into 
subcircuits  so  that  the  sparse  matrix  pointers  only  have  to  be  genr 
erated  and  stored  for  each  different  type  of  gate  and  not  for  all 
gates.  Secondly,  gates  that  are  latent  in  a  given  tiae  interval  can 
be  bypassed  in  the  solution  of  the  circuit  in  that  time  interval. 
These  improvements  have  resulted  in  approximately  a  factor  of  two 
improvement  in  both  memory  and  CPU  requireaents. 

However,  in  order  to  achieve  aore  draaatic  improvement  in  the 
performance  of  the  circuit  siaulator,  one  must  take  advantage  of  the 
one-way  propagation  of  the  signal  in  most  gates,  that  is,  the 
response  of  a  gate  has  little  effect  on  its  input  signals,  or  in 
other  words,  there  is  little  feedback  to  the  input  nodes.  When  this 
feedback  coupling  is  sufficiently  weak,  one  can  relax  certain  vari¬ 
ables  in  order  to  decouple  the  gates. 

There  are  two  principal  points  of  departure  from  the  "standard* 
simulation  approach  which  may  be  taken  at  any  of  the  three  main  lev¬ 
els  of  circuit  simulation  (i.e.,  the  time  level,  nonlinear  equation 


level  end  the  linear  equation  level),  naaely,  "tearing"  decomposition 
and  ’temporal”  deeoapoaition.  As  to  these  tvo  techniques,  the  foraer 
aias  to  retain  the  convergence  and  stability  properties  of  the  'Stan¬ 
ds  rtf'  aethod,  vhile  the  latter  is  related  to  the  so-called  "relaxa¬ 
tion"  or  'Indirect"  aethod  [9,10],  and  is  characterized  by  coapletely 
different  convergence  and  stability  properties. 

Ever  since  the  developaent  of  the  MOTIS  program  in  197S,  the 
first  relaxation-based  nonlinear  tine-domain  transient  circuit  simu¬ 
lator,  there  have  been  a  series  of  relaxationr-based  simulators 
developed.  However,  some  of  these  simulators  suffer  from  serious 
numerical  properties.  In  the  era  of  VLSI  circuits,  the  relaxation- 
based  techniques  seem  to  be  an  inevitable  tendency  [11]  in  order  to 
achieve  the  speed  necessary  to  analyze  large  circuits.  Thus,  a  com¬ 
plete,  deterministic  study  of  the  numerical  stability  and  convergence 
properties  of  relaxation  methods  is  a  valuable  and  interesting 
research  topic. 

This  research  is  concerned  with  the  investigation  of  the  numer¬ 
ical  stability  and  convergence  properties  of  the  tvo  commonly  used 
relaxation  methods,  waveform  and  time-point  (Gauss-Seidel) ,  as  a 
function  of  the  degree  of  coupling.  The  modified  waveform  relaxation 
aethod  for  the  simulation  of  VLSI  circuits  in  the  time  domain  is  stu¬ 
died.  This  approach  is  similar  to  the  waveform  relaxation  method  in 
RELAX  [9].  However,  the  entire  time  interval  is  separated  into  small 
tiae  windows.  Instead  of  a  sweeping  iteration  in  the  entire  time 
interval  [  0,  T  ],  the  sweep  iteration  is  processed  sequentially  in 


each  window  and  the  waveforms  are  concatenated.  The  experimental 
results  [12]  show  that  the  modified  technique  can  get  better  resolu¬ 
tion  with  reduced  cost,  thus  permitting  larger  systems  to  be  simu¬ 
lated. 

In  the  transient  analysis  of  NOS  circuits  in  which  the  floating 
gate  to  drain  capacitance  is  modeled,  the  pole- splitting  phenomenon 
[13]  occurs  when  the  transistors  are  active.  The  stiffness  of  the 
system  is  determined  by  the  degree  to  which  the  poles  split.  In  this 
research,  the  numerical  properties  of  the  waveform  relaxation  method 
are  studied  for  the  analysis  of  linear  stiff  systems.  Ve  examine  the 
numerical  properties  by  means  of  the  test  circuit  which  was  generated 
by  linearizing  the  model  of  a  cascade  of  two  inverters  in  which  the 
transistors  are  assumed  to  be  active.  When  the  window  size  is  shrunk 
to  the  time  step  size,  the  waveform  relaxation  method  is  equivalent 
to  a  time-point  relaxation  method.  The  time-point  relaxation  method 
considered  in  this  research  is  the  isodified  Gauss-Seidel  method  [9], 
which  uses  a  forward  predictor  for  the  unsolved  variables.  It  has 
been  demonstrated  to  be  an  efficient  technique  in  the  timing  analysis 
of  MDS  circuits.  The  numerical  properties  of  the  modified  Gauss- 
Seidel  method  are  also  observed  for  linear  stiff  systems. 

The  SLATE  program  is  modified  to  implement  the  relaxation  tech¬ 
niques  in  this  research.  Chapter  2  describes  briefly  the  different 
relaxation  techniques  used  in  electrical  simulation.  In  Chapter  3  the 
modified  Gauss-Seidel  technique  and  its  numerical  properties  for  the 
analysis  of  a  linear  stiff  system  are  discussed.  An  introduction  of 


the  waveform  relaxation  method  (WBM)  and  the  aodified  window  waveform 
relaxation  aethod  and  its  nnaerieal  properties  are  disenssed  in 
Chapter  4.  The  experimental  results  of  the  program  SLATE- R  (a  Simu¬ 
lator  with  Latency  and  Tearing  -  Relaxed  version)  are  shown  in 
Chapter  5.  Chapter  6  describes  the  implementations  of  the  SLATE-R 
program.  Finally*  the  conclusions  are  in  Chapter  7. 


REVIEW  OF  SIMULATION  TECHNIQUES 


Standard  circuit  sisulatora  have  proven  to  be  reliable  and 
effective  when  the  size  of  tbe  circuit  is  limited  to  several  hundred 
transistors.  As  the  size  of  the  circuit  increases,  the  primary 
memory  storage  and  CPU  time  used  by  these  simulators  increase  rapidly 
[14]  deapite  the  use  of  sparse  matrix  techniques.  In  order  to  simu¬ 
late  LSI  and  VLSI  circuits,  a  number  of  techniques  have  been  used  to 
improve  the  performance  of  the  standard  circuit  simulators.  Basi¬ 
cally.  these  nonstandard  simulators  can  be  meaningfully  classified  by 
the  decomposition  techniques. 

Decomposition  refers  to  the  technique  that  subdivides  the  whole 
set  of  the  system  equations  into  several  subsets.  Decomposition  can 
be  taken  at  any  of  the  three  main  levels  of  circuit  simulation  (i.e., 
the  time  level,  nonlinear  equation  level  and  linear  equation  level). 
Actually,  the  system  of  equations,  no  matter  at  what  level  it  is,  is 
processed  by  a  decomposition  technique  as  a  composition  of  several 
systems  with  interactions  between  them.  Once  the  system  is  decomposed 
into  subsystems,  the  technique  of  solving  each  subsystem  is  identical 
to  the  conventional  nmericsl  approaches. 

In  this  chapter,  we  will  briefly  review  the  analysis  techniques 
used  in  conventional  circuit  simulation.  Then  we  will  describe  the 
basic  concepts  and  properties  of  two  systematic  approaches  to 


accomplish  system  decomposition, 


24  gteafliHfl  Slssa.it  Simulators 

The  nonlinear  algebraic-differential  equations  which  describe 
the  performance  of  the  integrated  circuits  are  generally  of  the  fol¬ 
lowing  form  : 

f(x(t) ,x(t) ,u)«0  (2.1) 

E(x(0)-x  )-o  (2.2) 

o 

where  x  *  Rp  is  the  unknown  variable  at  time  t  with  the  given  ini¬ 
tial  value  xQ;  x  1*  the  time  derivative  of  x  at  time  t;  u  a  Rr  is 
the  vector  of  all  the  inputs  and  possibly  their  time  derivatives; 

f  ;  Rp  x  Rp  x  Rr  Rp  is  a  continuous  function;  and  E  a  Rnxp ,  n  £  p 

is  a  matrix  of  rank  n  such  that  E(x(t))  is  the  state  of  the  system  at 

ti*®  t.  Let  (  t^  ;  i  ■  0,1,..., N  }  denote  a  sequence  of  increasing 

time  points  selected  by  the  simulator  with  t  ■  o  and  t»  ■  T,  where  T 

o  n 

is  the  given  simulation  time  interval. 

By  using  an  implicit  integration  method,  the  system  of  equa¬ 
tions  (2.1)  is  transformed  into  a  discrete  time  sequence  of  nonlinear 
algebraic  equations.  At  each  time  point  t^,  the  corresponding  alge¬ 
braic  equation  can  be  written  as 


where  x*  denotes  the  computed  ▼sine  of  x(tj). 

The  solution  of  (2.3)  is  obtained  by  applying  the  Newtou- 
Raphson  method.  At  each  iteration  in  the  Newton-Raphson  method*  the 
resulting  linearised  equations  are  of  the  form: 

A  x  ■  b  (2.4) 

The  Newton-Raphson  iteration  is  carried  out  until  the  conver¬ 
gence  is  achieved  or  the  iteration  limit  is  exceeded.  At  each  itera¬ 
tion  count*  the  function  and  Jacobian  matrix  evaluations  are  neces¬ 
sary  to  construct  the  coefficient  matrix  A  in  Equation  (2.4).  Because 
in  the  circuit  simulation  environment,  the  matrix  A  is  usually  very 
sparse;  hence,  the  Gaussian  elimination  method  is  implemented  by 
using  a  sparse  matrix  technique  to  reduce  the  computational  opera¬ 
tions. 

It  is  very  important  to  exploit  the  sparse  technique,  since  the 
computational  complexity  of  the  Gaussian  elimination  method  applied 

to  an  nxn  full  matrix  is  proportional  to  n3  vhile  the  computational 
complexity  of  the  Gaussian  elimination  method  with  sparse  techniques 
in  on  the  average  [1]  proportional  to  n°  ;  a  s  [1.2,  1.5].  Figure  2.1 
shows  the  hierarchical  organixation  of  conventional  numerical  methods 
for  time  domain  simulation. 

As  the  sixe  of  the  circuit  increases  beyond  several  hundred 
transistors*  sparse  techniques  alone  are  not  enough  to  provide  simu¬ 
lation  results  in  a  reasonable  time.  In  the  next  section  we  will 
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Fig.  2.1  Hierachical  Organization  of  Conventional 
Cironit  S isolation. 


describe  the  features  of  systea  decoaposition,  which  have  led  to  the 
developaent  of  several  new  generations  of  circuit  s isolators. 


2.2  Svstea  Decoaposition 

The  new  generation  of  circuit  siaulators  for  large  digital  cir¬ 
cuits  uses  two  principles,  naaely  the  "tearing'  decoaposition  and  the 
'Velaxatioif'  decoaposition  [IS].  Coapared  with  the  conventional 
techniques,  the  tearing  approach  is  just  soae  special  reordering 
strategy.  Therefore,  the  coaputational  operations  of  the  tearing 
approach  depends  aainly  on  the  structure  of  the  systea.  If  the  systea 
structure  is  not  sparse  or  when  the  block  structure  of  the  systea  can 
not  be  exploited,  this  approach  does  not  offer  any  benefit  over  con¬ 
ventional  techniques.  In  other  words,  the  tearing  aethod  will  be 
powerful  only  if  it  is  contained  with  soae  other  strategies  such  as 
the  exploitation  of  latency  and  the  exploitation  of  the  repetitive¬ 
ness  of  a  liaited  nuaber  of  subeircuits. 

In  contrast,  the  techniques  classified  as  relaxation  aethods  are 
characterised  by  coapletely  different  convergence  and  stability  pro¬ 
perties.  Ve  will  discuss  the  nuaerieal  properties  of  the  relaxation 
techniques  in  detail  in  Chapter  3  and  Chapter  4. 


These  two  different  approaches  to  decoaposition  techniques  will 
be  described  in  aore  detail  in  the  next  sections. 


Tht  idea  of  tearing  decomposition  techniques  is  to  seleet  a  set 
of  tearing  variables  to  separate  the  entire  system  into  several  sub¬ 
systems.  There  are  two  different  approaches  to  tearing  decomposition, 
namely,  branch  tearing  and  node  tearing.  The  former  uses  tearing 
branches  as  the  tearing  variables  (Figure  2.2)  while  the  latter  uses 
tearing  nodes  as  the  tearing  variables  (Figure  2.3).  The  entire  sys¬ 
tem  is  torn  apart  into  several  subsystems  by  removing  these  tearing 
variables. 

Algebraically,  the  branch  tearing  method  is  equivalent  to  a  spe¬ 
cial  reordering  of  the  l^brid  analysis  equations,  whereas  the  node 
tearing  method  is  equivalent  to  a  special  reordering  of  nodal 
analysis  equations.  However,  both  methods  result  in  a  Bordered  Block 
Diagonal  (BBD)  matrix  structure  (Figure  2.4a).  Each  block 
corresponds  to  a  subsystem,  sad  the  border  corresponds  to  the  inter¬ 
connections.  The  more  general  matrix  form  that  is  suitable  for  tear¬ 
ing  decomposition  is  the  Border  Block  Lower  Triangular  (BBLT)  struc¬ 
ture  (Figure  2.4b). 

Tearing  decomposition  of  linear  algebraic  equations  can  be 
implemented  through  the  Block  LU  Factorization  [16] .  Let  us  consider 
the  system  of  equations  shown  in  Figure  2.5,  i.e., 

A  x  *  b 

where  x  *  [  vw]^eRnis  the  vector  of  the  unknown  variables  and  w 
is  the  vector  of  the  tearing  variables.  The  solution  strategy  has 
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been  to  first  eliminate  the  variables  v  from  the  system  of  equations 
to  obtain  the  following  reduced  subsystem  from  which  the  value  of  the 
tearing  variable  w  is  aohieved. 


(  E  -I®  *C  )w  *  b  -DB~*b 


where  the  corresponding  meanings  of  the  matrices  and  vectors  are 
given  in  Figure  2.5.  The  computed  solution  of  w  is  then  used  to  com¬ 
pute  the  solution  of  v  blockwise.  Some  other  techniques,  such  as  the 
Tearing  Algorithm  [16].  can  also  be  used  in  implementing  tearing 
decomposition.  However,  the  Tearing  Algorithm  is  just  some  particu¬ 
lar  form  of  the  Block  LD  Factorization  [17].  The  details  of  this 
approach  are  given  in  [16] . 

A  series  of  new  circuit  simulators,  such  as  SLATE  [8]  and  SANSON 
[18],  implement  tearing  decomposition  techniques  in  the  solution  of 
the  linear  algebraic  equations  and  give  reasonable  improvement. 
Actually,  SAMSON  is  a  mixed  mode  simulator  that  also  implements  a 
block  relaxation  technique  for  solving  the  nonlinear  algebraic  equa¬ 
tions.  We  will  discuss  the  relaxation  technique  later  in  this 
chapter. 

Multilevel  Newton-Kaphson  techniques  [7]  can  also  be  used  in 
solving  a  system  of  nonlinear  algebraic  equations  decomposed  by  means 
of  the  tearing  method.  Here  we  just  describe  this  procedure  briefly. 
The  solution  strategy  has  been  to  estimate  the  tearing  variables 
(e.g.,  current  or  voltage)  at  each  tearing  port  and  to  excite  the 


tors  subsystems  with  independent  sources  at  these  ports.  The  remain¬ 
ing  port  responses  are  computed  and  are  substituted  into  the  inter¬ 
connection  equations.  If  these  equations  are  not  satisfied,  another 


guess  is  made  of  the  variables  chosen  as  port  excitations.  This 
iterative  procedure  continues  until  convergence  is  aohieved.  MACRO 
[7]  is  an  example  of  the  simulator  that  implements  the  multilevel 
Newton-Raphson  method  for  solving  the  nonlinear  algebraic  equations. 

2.4  Relaxation  Decomposition 

In  the  tearing  decomposition  approach,  the  original  system  of 
equations  may  be  sparse  while  the  reduced  interconnection  matrix  may 
not.  Therefore,  the  computational  advantage  of  the  tearing  decomposi¬ 
tion  technique  over  the  conventional  circuit  simulator  depends  cru¬ 
cially  on  how  small  each  decomposed  subsystem  and  the  reduced  inter¬ 
connection  matrix  are. 

However,  in  the  relaxation  decomposition  approach,  the  system  of 
equations  is  simply  partitioned  into  several  subsystems.  There  is  no 
restriction  on  the  block  structure  of  the  system.  Vithin  each  sub¬ 
system,  the  variables  to  be  solved  for  sre  defined  as  internal  vari¬ 
ables  and  the  other  variables  are  defined  as  external  variables. 

The  solution  strategy  of  the  relaxation  decomposition  approach 
is  to  solve  each  subsystem  individually,  and  iterate  through  all  the 
subsystems  until  the  convergence  is  achieved.  In  order  to  solve  a 
subsystem  for  its  internal  variables,  the  subsystem  has  to  be  decou¬ 
pled  by  replacing  the  values  of  its  external  variables.  Usually,  it 


takes  a  number  of  iterations,  repeatedly  solving  each  of  the  decou¬ 
pled  subsystems,  so  that  the  values  of  the  external  variables  of  each 
subsystem  can  be  updated. 

There  are  two  vell-knovn  types  of  relaxation  techniques,  namely 
the  Gauss-Jacobi  (GJ)  relaxation  [3]  and  the  Ganss-Seidel  (GS)  relax¬ 
ation  [4].  These  approaches  as  veil  as  the  Ganss-Seidel  predictor 
technique  [10]  will  be  discussed  in  detail  in  Chapter  3.  Figure  2.6 
shows  the  use  of  the  relaxation  technique  at  various  levels  of  the 
system  of  equations.  The  relaxation  approach  applied  at  the  system 
level  of  the  nonlinear  differential  equations  such  as  RELAX  [9]  is 
the  so-called  waveform  relaxation  method  which  will  be  discussed  in 
Chapter  4.  The  relaxation  approach  applied  at  the  system  of  algebraic 
equations  such  as  MDTIS  [3],  MOTIS-C  [4],  , SPLICE  [6]  and  PREMOS  [10] 
is  the  time-point  relaxation  method  which  will  be  discussed  in 
Chapter  3. 

2  ,5.  Concluding  Remarks 

Ve  have  reviewed  various  decomposition  techniques  that  have 
been  proposed  and  implemented.  The  relaxation  approach  has  the  poten¬ 
tial  of  achieving  the  speeds  necessary  for  the  next  new  generation  of 
circuit  simulators  [11].  However,  the  relaxation  approach  does  not 
guarantee  that  the  sequence  of  the  iterated  solutions  will  converge 
to  the  exact  solution  unless  the  convergence  condition  on  the  parti¬ 


tioned  system  is  satisfied. 


RELAXATION 

Iteration 


System  of  Nonlinear 
Differential  Equations 


Fig.  2.6 


Hie  Use  of  Relaxation  Techniques  at  Different  Levels 
of  System  of  Equations. 
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The  study  of  the  relaxation  technique  at  different  levels  of  the 
systea  of  equations  is  still  open.  The  investigation  of  the  nnaerical 
convergence  and  stability  properties  of  the  relaxation  techniques  is 
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CHAPTER  3 


THE  MODIFIED  GAD  SS-  SEIDEL  METHOD  AND  ITS  NUMERICAL  PROPERTIES 


In  the  simulation  of  integrated  circnits  ,  relaxation  tech¬ 
niques  were  proposed  for  the  solution  of  sianltaneons  algebraic  equa¬ 
tions  in  order  to  speed  op  the  simulator  so  that  larger  circnits 
could  be  handled.  These  methods  are  iterative  and  convergence  depends 
on  the  coupling  among  the  variables.  Since  many  digital  MDS  circuits 
have  rather  weak  coupling  from  output  nodes  to  input  nodes  because 
the  input  is  the  high  impedance  gate,  these  methods  should  perform 
well  on  digital  MOS  circuits.  This  reasoning  led  to  the  development 
of  the  MOTTS  program  [3],  a  landmark  for  the  GAD  area.  Following  the 
MOTIS  simulator,  a  series  of  the  MDTIS-type  simulators,  >ach  as 
MOTIS-C  [4],  SPLICE  [ 6 ]  and  FRENOS  [10]  were  developed.  The  ~elas.- 
tion  techniques  used  in  these  simulators  decompose  the  system  at  the 
level  of  the  difference  equation. 

In  this  chapter,  we  describe  the  time— point  relaxation  tech¬ 
niques,  such  as  Gauss-Jscobi  relaxation  and  Gauss-Seidel  relaxation, 
together  with  a  study  of  their  convergence  and  stability  properties. 
This  study  reveals  why  time-point  relaxation  methods  work  well  on 
some  special  classes  of  circuits,  but  perform  very  poorly  on  other 
types  of  circuits. 
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1*1  Mathematical  Formulation 


He  systems  of  the  nonlinear  differential-algebraic  equationa 
which  deaoribe  the  performance  of  the  integrated  circuits  are  gen¬ 
eral  ly  of  the  following  form  : 


f (x(t) ,x(t) ,u)“0 


E(x(0)-xo)*o 


(3.1) 


(3.2) 


where  x  a  Rp  is  the  unknown  variables  at  time  t  with  the  given 

initial  value  x  ;  z  is  the  time  derivative  of  x  at  time  t;  u  e  Rr  is 
o' 

the  vector  of  all  the  inputs  and  their  time  derivatives; 

f  :  Rp  x  Rp  x  Rr  — >  Rp  is  a  continuous  function;  and  E  a  n  i  p 

is  a  matrix  of  rank  n  such  that  E(x(t))  is  the  state  of  the  system  at 

time  t.  Let  (  t^,  ;  i  *  0,1,..., N  }  denote  a  sequence  of  increasing 

time  points  selected  by  the  simulator  with  t  »  0  and  -  T  where  T 

o  w 

is  the  given  simulation  time  interval. 

At  every  time  point  tn.  Equation  (3.1)  can  be  approximated  by 
using  an  implicit  integral  formula,  suoh  as  the  backward  Euler  for¬ 
mula,  to  generate  a  set  of  nonlinear  algebraic  equations  of  the  fol¬ 
lowing  form: 


g(xn)  -  0 


(3.3) 


In  a  standard  circuit  simulator.  Equation  (3.3)  is  solved  by 
using  the  modified  Newton's  method  which  may  take  a  number  of 
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iterations  to  reach  the  sointioa.  At  each  iteration,  a  series  of 
algorithmic  procedures,  such  as  function  and  Jacobian  evaluations,  LU 
factorisation  and  aparse  matrix  sointioa  techniques,  are  repeated. 
The  linearized  equations  at  each  iteration  in  the  Newton's  method  are 
of  the  form: 


A  x  *b 


(3.4) 


3..1..1  Gauss- Jacobi  Relaxation  Method 


To  fit  the  fast  growth  of  VLSI  systems,  a  series  of  new  genera¬ 
tion  simulators  have  been  proposed  which  depart  radically  frost  the 
standard  algorithmic  techniques.  One  of  the  principal  points  of 
departure  from  the  standard  simulation  approach  is  relaxation  decom¬ 
position.  There  are  two  coaimon  types  of  time-point  relaxation  tech¬ 
niques  used  in  the  new  generation  simulators,  namely  the  Gauss-Jacobi 
relaxation,  e.g.,  in  the  MOTIS  program  [3],  and  the  Gauss-Seidel 
relaxation,  e.g.,  in  the  MOTIS-C  program  [4]. 


In  MOTIS,  the  components  of  xn  in  Equation  (3.3)  are  obtained 


one  at  a  time  by  solving  a  sequence  of  scalar  equations,  i.e.,  at 


ti®®  *u+i»  the  k-th  component  of  x11*^#  *  is  obtained  by  solving 


the  following  scalar  equation: 


o  /  0  _n  v&  _  n 

*kul*  x2*  xk-l*  xk*  xk+l*  xnr 


(3.5) 
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In  the  solution  sequence,  for  each  scalar  equation,  the  previ¬ 
ous  values  are  used  for  all  the  'exogenous'  variables.  This  process 
yields  an  approxiaation  to  the  solution  which  can  be  substituted  into 
Equation  (3.3)  to  determine  if  the  values  satisfy  the  equation.  If 
not,  the  process  is  repeated  and  hopefully  the  iterates  converge 
rapidly  to  the  solution. 


gauss- Seidel  Relaxation  Method 

MOTIS-C  used  the  Gauss-Seidel  algorithm  to  achieve  a  better 

result  than  that  of  MDTIS.  The  Gauss-Seidel  relaxation  aethod  is 

quite  siailar  to  the  Gauss-Jacobi  aethod,  but  recently  updated  values 

of  the  solved  variable  are  retained.  The  solution  sequence  is  to 

1 

solve  the  following  equation  for  xfc 


a  (r®11  x®*1  xn+l  _  n  n%  m  0 

*k(xl  '  x2  *  •*•*  xfc-l»  xk'  *fcfl»  V  0 


(3.6) 


The  process  is  repeated  until  the  sequence  converges  to  the 
solution  or  the  nuaber  of  iterations  becoaes  excessive.  If  each  sub- 
systea  has  only  one  internal  variable,  i.e. ,  the  x^'s  in  Equation 
(3.5)  and  Equation  (3.6)  are  scalar,  the  relaxation  approaches  are 
said  to  be  done  pointwise,  that  is,  point  Gauss-Jacobi  relaxation 
aethod  and  point  Gauss-Seidel  relaxation.  Otherwise,  it  is  said  to  be 
bloekwise.  Froa  the  network  point  of  view,  the  pointwise  relaxation 
aethods  are  equivalent  to  decomposing  the  network  at  each  node, 
whereas  the  bloekwise  relaxation  aethods  decompose  the  network  into 
subnetworks  which  nay  consist  of  aore  than  one  node. 
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In  the  Gauss- Seidel  method,  because  the  previous  values  are 
need  for  those  unsolved  variables,  it  has  been  found  that  usually  the 
sequence  converges  auch  aore  rapidly  to  the  solution.  Originally 
MOTIS  and  MDTIS-C  vere  prograaaed  to  only  do  the  first  iteration  and 
to  control  the  tiae  step  to  achieve  accuracy.  For  large-scale  cir¬ 
cuit  analysis,  this  one-sweep  approach  was  used  to  save  additional 
coaputational  steps.  In  order  to  achieve  less  error  and  better  con¬ 
vergence  for  the  unsolved  variables,  PREMOS  [10]  used  a  one-sweep 
Gauss- Seidel  technique  with  prediction  and  got  reasonable  iaproveaent 
in  the  transient  analysis  of  MDS  circuits.  This  technique  is 
described  below. 


jL.i.3  Modified  Gauss- Seidel  Relaxation  Method 


The  solution  strategy  of  the  modified  Gauss- Seidel  method  has 
been  to  use  a  forward  first-order  linear  predictor  for  the  unsolved 
variables.  That  is,  at  the  i-th  scalar  equation,  to  solve  for  x^, 
the  values  for  the  unsolved  variables  x^'s,  j  >  i  are  predicted 


according  to  the  following  formula: 


„o+l 


j, guess 


n  _  _a-l 


(3.7) 


n-1 


Experience  has  shown  that  of  the  above  time-point  relaxation 
methods,  the  Gauss-Seidel  method  with  prediction,  used  in  the  program 
PREMOS,  usually  performs  the  best  for  h  small  enough.  However,  the 
fact  that  all  three  methods  take  only  one  sweep  has  proven  to  cause 


accuracy  problems  in  some  circuits.  Thus  more  recent  versions  of 
MOTIS  [19]  and  SPLICE1  [20]  iterate  until  the  sequence  converges  to  a 
solution.  If  the  number  of  iterations  becomes  excessive,  the  time 


step  is  reduced  to  improve  convergence.  However,  this  approach  can 
become  computationally  inefficient  under  certain  conditions.  To 
understand  why,  the  convergence  and  stability  properties  of  these 
time-point  relaxation  methods  are  analyzed  in  the  next  section. 

1.1  Ihfi  Convergence  Stability  Properties 

In  order  to  study  the  stability  properties  of  numerical 
integration  methods,  a  simple  first-order  differential  equation  of 
the  form 

i  =  a  x  (3.8) 

is  chosen  by  numerical  analysts  as  a  test  vehicle.  Similarly,  in 
order  to  study  the  numerical  properties  of  relaxation  methods  used  to 
decouple  a  system  of  differential  equations  describing  the  behavior 
of  digital  circuits,  the  simple  linear  test  circuit  in  Figure  3.1  was 
chosen  [21].  This  test  circuit  was  generated  by  linearizing  the  model 
of  a  cascade  of  two  inverters  in  which  the  transistors  are  assumed  to 
be  active.  The  elements  i#,  g^  and  represent  the  Norton  equivalent 
circuit  at  the  output  of  the  first  inverter,  and  the  capacitor  Cc 
represents  the  coupling  (feedback)  from  the  output  node  of  the  second 
inverter  to  its  input  node  #1.  The  elements  C^,  c^,  g^  and  g2  are  all 
scaled  to  1,  and  gB  gad  Cc  are  adjustable. 


i 


The  transient  solution  to  this  test  circuit  is  of  the  fora 


-t/r,  -t/t, 

“  ki  •  +  • 


(3.9) 


The  dominant  natural  frequency  is  and  it  determines  how 


slowly  the  transient  response  decays.  One  can  estimate  the  dominant 


natural  frequency  using  the  Miller  effect  approximation.  In  the  next 


section,  we  examine  the  Gause-Seidel  relaxation  method  and  determine 


the  relation  between  the  stiffness  of  the  test  circuit  and  the  time 


step  h  needed  in  order  to  achiewe  good  convergence  to  the  solution. 


144  Camuiir  Si  Ilf  Time- Point  Gauss- Seidel 


Method 


The  nunerical  solution  is  begun  in  the  usual  way.  t?  it  is.  the 


differential-algebraic  Equations  (3.1)  are  converted  to  a  set  of 


algebraic  Equations  (3.3>  by  means  of  an  implicit  integration  for¬ 


mula.  The  equivalent  circuit  resulting  from  this  transf ormstion  is 


obtained  by  replacing  the  energy  storage  elements  with  their  compel 


ion  models.  For  instance,  the  node  equation  of  the  test  circuit  of 


Figure  3.1  can  be  expressed  in  the  following  form: 


VCc 


C2+Cc 


*.  *2 


(3.10) 


By  using  the  backward  Euler  formula  as  follows: 


(3.11) 


where  h&  is  the  size  of  the  time  step,  the  equivalent  circuit  is  as 
shown  in  Figure  3.2.  The  equations  for  this  circuit  are 


1 

Cl+Cc+tngl 

"Cc 

Vl.n 

h 

11 

Cc  +  ^n*m 

hn*2+C2+Cc_ 

V2.n 

(3.12) 


VCc 

^c  " 

V1 , n— 1 

\<*n> 

-C 

c 

C2+Cc 

v2,n-l 

0 

Note  that  if  Cq  s  0,  the  two  nodes  are  decoupled.  Thus  we  can 
solve  for  v^  first  (the  output  of  the  first  gate),  and  then  we  can 
solve  for  v^  (the  output  of  the  second  gate)  without  inverting  the 
circuit  matrix.  In  special  purpose  circuit  simulators  for  large 
digital  circuits,  this  is  essentially  what  is  done,  except  on  a  much 
larger  scale. 

If  o,  the  equations  (gates)  can  be  decoupled  by  relaxing 

certain  nodes  by  means  of  the  Oauss-Seidel  method.  For  instance,  for 
the  test  circuit,  we  express  Equation  (3.12)  in  the  form 


(3.13) 


(  L  ♦  D  )  va  -  -  v  vQ  +  B  v^  ♦  i# 


where 


L  +  D 


1  cl+Cc+hn*l 


-C  + 


V2+C2+C, 


(3.14) 


0  “C/hn 
c  & 


(3.15) 


The  €auss- Seidel  method  of  solution  for  Equation  (3.13)  is  the  itera- 


v(k)  *  -(L+D)-1  Uv(k  X)  +  Bv  ,  +  1 
n  n  “tt-i  —  s 


(3.16) 


where 


v  <°>  -  y  f 
n  n-1 


(3.17) 


-(L+D)_1D 


?  [ ;  'SC- 1 


No  matrix  inversion  is  required,  since  L+D  is  triangular.  A  suf¬ 
ficient  condition  for  these  iterates  to  converge  to  the  solution  is 


given  by  the  following  inequality: 


||(LfD)_1U||  <  1  (3.19) 

The  above  norm  is  a  function  of  the  step  size  hQ.  Using  the  1^ 
norm,  we  found  the  upper  bound  on  the  step  size  h^  for  the  range  of 
parameter  values  in  Table  3.1  such  that  Equation  (3.19)  is  satisfied. 
This  upper  bound.  hm#  i«  given  in  Table  3.2. 


Tafrlp  3.2 

time  steo  with  the  Gsuss-Seidel  methods 

case 

Cc(F) 

*m(S)  X1 

x2 

^unstable 

1 

0.01 

1 

1.1 

0.92 

• 

• 

2 

0.10 

1 

1.5 

0.80 

• 

• 

3 

I  $ 

1 

4.3 

0.70 

• 

• 

4 

0.01 

10 

1.4 

0.74 

• 

• 

5 

0.10 

10 

2.8 

0.43 

• 

* 

6 

1 

10 

13.8 

0.22 

1.0 

2.0 

7 

0.01 

100 

2.6 

0.39 

• 

• 

8 

0.10 

100 

12.1 

0.099 

0.16 

0.32 

9 

1 

100 

104.0 

0.029 

0.01 

11.9 

0.085 

0.14 

0.26 

11 

0.10 

102.0 

0.012 

0.025 

12 

1.00 

1000 

1004.0 

0.0030 

0.0050 

Criterion  satisfied  for  all  h  >  0 


The  ten  i*  the  step  size  st  which  the  solution 

be cones  unstable  in  the  modified  Gaos s- Seidel  method  which  will  be 
discnssed  in  the  next  section. 

From  Table  3.2  we  foond  that  in  order  to  satisfy  inequality 
(3.19)  the  step  size  h  has  to  be  approximately  less  than  or  equal  to 
the  size  of  the  smallest  time  constant  in  the  system.  For  example, 
for  Co  «  0.1  F  and  g1  ■  100  S,  then  h  <  0.16  s  is  sufficient  for  con¬ 
vergence.  If  ga  is  increased  to  1000  S,  then  h  <  0.013  s  is  suffi¬ 
cient  for  convergence.  Our  numerical  experiments  for  this  circuit 
concur  with  the  above  observations.  For  instance,  for  Cq  >  o.l  F  and 
=100  S,  if  h  =  0.1  s  the  iterative  solution  sequences  converged, 
but  they  did  not  converge  for  h  =  0.2  s. 


1*1  *1  Nb*«rig«l  Stability  ill  ihe  jEag-fugep  Gauss- Seidel  Iteration 
Fit)*  Prediction 

In  our  test  circuit  we  used  the  following  predictor 


v2,arl 


+  hn 


l  2 .  n~  1, 

'  h 


zrl 


(3.20) 


in  order  to  initislize  Equation  (3.16).  If  k  =  1  in  Equation  (3.16), 
then  upon  substitution  of  Equation  (3.20)  into  Equation  (3.16)  we 
obtain  the  difference  equation: 


: 

i 

a 


$ 

4 

s 

ij 

•i 


4 


f Cl+Cc  +  hn*l  o  1 

L  <-V  +  «mhn  hn*2  +C2+Cc  J 


in  + 


-(Cj+Cg) 


v  a  4 

n-1 

"(C2+C.> 


Vl 


r  o  cc  ]  .  r  *.<«.>  ] 

l  0  0  J  J«-2  ■  kn  1  0  ] 


(3.21) 


Applying  the  Z- transform  yields  the  following  characteristic 


polynomial : 


Pz3  +  Qz2  +  Rz  +S  »  0 


where  for  h  =  h  „  »  h 
n  n-i 


(3.22) 


P  -  (gjh  +  Cx  +  Cc)(g2h  +  C2  +  Cc) 
a  -  -(C2  +  (^(Cj  +  Cc  +glk)  -  (Cx  +  Cc)(c2  +  Cc  +  g2h) 


*Cc<«.h  -  C 


1  -  (ci  ♦  Cc>(C2  *  Cc>  ♦  cj  ♦  Cc(Cc-,wb> 


s  -  -c2 

c 


The  term  represents  the  time  step  at  which  Equation 

(3.22)  has  a  root  on  the  unit  circle.  For  our  test  circuit  the  values 

^mstable  *re  **vea  i®  Table  3.2.  If  the  time  step  becomes  greater 
ti**®  ^unstable  integration  becomes  nnericslly  unstable. 
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Furthermore,  the  root  loci  ia  Figure  3.3  ead  Figure  3.4  give 
more  details  shout  the  stability  of  the  modified  Gauss- Seidel  awthod 
for  the  stiff  esse.  The  roots  of  the  characteristic  equation  (3.22) 
are  a  fuaetioa  of  h.  Ia  order  for  the  modified  Gauss-Seidel  method  to 
be  aumerically  stable,  the  roots  of  Equation  (3.22)  must  lie  iaside 
the  unit  circle.  For  hfi  small  eaough.  this  is  the  case.  However,  we 
fouad  from  the  root  locus  that  if  h^  i8  much  larger  thaa  the  smallest 
time  coastaat  in  the  circuit  (approximately  a  factor  of  two) .  these 
roots  lie  outside  the  unit  circle.  For  example,  the  root  locus  of  the 
case  when  =»  0.1  F  sad  gj  ■  1000  S  is  shown  ia  Figure  3.3.  Note 
that  for  h  >  0.025  s  one  of  the  roots  lies  outside  the  unit  circle. 
From  Table  3.1  this  is  approximately  2t2.  Thus,  in  this  case  we  are 
coastraiaed  by  the  numerical  stability  properties  of  the  algorithm  to 
keep  the  step  size  in  the  neighborhood  of  the  smallest  time  constant 
over  the  entire  time  interval. 

Table  3.3  demonstrates  the  constraint  condition  of  the  modified 
Gauss-Seidel  algorithm.  The  analysis  time  interval  in  Table  3.3  is 
3*2*  where  is  the  larger  time  constant  of  the  system  in  each  case, 
such  that  the  system  can  reach  the  steady  state  during  the  analysis 
time  interval.  Tith  the  local  truncation  error  0.001,  the  number  of 
the  time  points  needed  for  each  case  is  counted  in  Table  3.3  to  see 
the  effect  of  the  system  stiffness  on  the  time  step.  It  is  found 
that  the  modified  Gauss-Seidel  method  takes  many  time  points  for  the 
stiff  systems.  For  example,  in  ease  9  with  Cq>i.o  F  and  g^-100  S, 

“  104.0  s,  it  takes  2650  time  points  in  3?^;  the  average  time  step  in 
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this  case  is  0.117  s  which  is  about  four  times  the  smallest  time  con¬ 


stant  tj  (=0.029  s) .  while  for  the  standard  circuit  simulation  tech¬ 


nique  it  takes  only  20  time  points  with  the  average  time  step  1 5.6  s. 


Another  instance  is  in  case  11  with  Cc=0.1  F  and  1^=1000  S,  = 
102.0  a,  and  it  takes  8122  time  points  in  3Tj,  the  average  time  step 


in  this  case  is  0.037  s  which  is  about  three  times  the  mealiest  time 


constant  x ^  (H1.012  s),  while  for  standard  circuit  simulation  it 


takes  only  44  time  points  with  the  average  time  step  6.96  s. 


Tftfelg  1-1. 


modified  Gauss-Seidel  standard  method 


335570 


xmm 


However*  for  thou  nonstiff  syttm  neh  as  eases  1  to  7,  both 
the  modified  Gauss-Seidel  method  aad  the  standard  eircait  siaalatioa 
technique  take  the  saae  nuaber  of  tine  points.  It  is  clear  fro* 
Table  3.3  that,  for  stiff  systeas,  the  tiae  steps  have  to  be  con¬ 
strained  to  the  order  of  the  aaallest  tiae  constants  of  the  systea  to 
keep  the  solutions  naaerically  stable. 

1.1  Cone lad ins  Eoaarfct 

1.14  Reaark  I 

In  the  above  tables*  the  paraaeters  ■  1,  Cj  ■  1,  *  1  and 
82  *  1*  the  floating  capacitor  CQ  and  the  transcondnctance  gB  are 
varied  to  change  the  stiffness  of  the  systea.  If  «  0,  the  systea 
in  Figure  3.1  contains  two  separated  subeircuits.  The  circuit  becoaes 
coupled  with  the  introduction  of  the  capacitor  C^.  if  the  coupling 
is  too  strong*  the  systea  becoaes  stiff  and  the  aodified  Gauss-Seidel 
method  does  not  work  well  as  seen  in  Table  3.3. 

The  conclusion  of  the  nuaerical  properties  of  the  tiae-point 
Gauss-Seidel  with  (or  without)  prediction  drawn  froa  this  linear  test 
circuit  is  that  [21*22],  if  the  coupling  eleaent  significantly 
affects  the  natural  frequencies  of  the  circuit  and  creates  a  stiff 
system,  then  no  advantage  is  gained  by  using  implicit  integration 
aethods  because  the  convergence  of  the  Gauss-Seidel  iterates  requires 
that  the  step  sixe  be  approximately  no  larger  than  the  smallest  time 
constant  in  the  circuit  over  the  entire  time  interval.  The 


computational  cost  can  be  significantly  increased  by  tbis  constraint, 
so  ancb  so  that  all  the  advantages  gained  by  decoupling  the  circuits 
cannot  only  be  lost,  bnt  the  computational  cost  might  even  exceed 
that  of  a  general  purpose  circuit  simulator. 

Figure  3.3  and  Figure  3.4  show  that  the  modified  Gauss-Seidel 
technique  goes  numerically  unstable  when  the  time  step  exceeds  the 
smallest  time  constant  of  the  system  by  a  factor  of  three  or  four. 
Given  the  above  results,  one  would  expect  the  modified  Gauss-Seidel 
method  to  perform  poorly  in  the  transient  analysis  of  NOS  circuits  in 
which  the  floating  gate-drain  capacitance  is  modeled.  In  such  cir¬ 
cuits  the  pole- split ting  phenomenon  [15]  occurs  when  the  transistors 
are  active.  The  degree  to  which  the  poles  split  determines  the 
stiffness  of  the  system,  and  is  determined  by  the  low  frequency  gain 
of  the  logic  gate,  e.g. ,  slope  of  the  dc  characteristic  in  the  active 
region.  However,  due  to  the  gate  delay  and  the  nonlinear  characteris¬ 
tics  of  the  IDS  transistor,  the  pole-splitting  phenomenon  does  not 
seem  to  be  very  strong  [10].  Figure  3.3  shows  that  the  modified 
Gauss-Seidel  method  works  well  for  the  3-stage  ring  oscillator. 

1.1.2  ISSAlk  li 

In  the  other  case,  suppose  there  is  a  coupling  capacitor  between 
two  subnetworks  whose  time  constants  are  already  widely  separated, 
that  is,  in  Figure  3.1  suppose  gj  -  1.0,  Cx  -  1,  and  Cj  «  1,  g2  “ 
1.0x10“*.  Table  3.4  shows  the  required  number  of  time  points  with  a 
variable  time  step  when  the  transconductanoe  gffl  -  i  and  the  floating 


capacitor  C.  i,  varied. 


where 


IltiS  1-1 


Ce(F)  Xj/Xj  method  I  method  II  method  III 


method  I  :  Modified  Gauss~  Seidel  Technique 
method  II  :  G ana a- Seidel  Technique 
method  III  :  Standard  Circuit  Simulation 


Table  3.4  shows  the  two  decoupled  subsystems  (Cg)  are  already 
stiff  and  the  coupling  capacitor  Cc  does  not  significantly  affect  the 
time  constants.  For  example,  with  «  0  one  subsystem  has  a  time 
constant  equal  to  1  s,  and  the  other  has  a  time  constant  equal  to 
10000  s.  Note  in  this  example  the  number  of  time  points  for  each  case 
in  Table  3.4  does  not  vary  significantly  as  the  coupling  capacitor 
changes.  Contrary  to  the  results  shown  in  Table  3.3,  in  this  case  the 
coupling  does  not  affect  the  convergence  and  stability  properties  too 
much.  This  conclusion  also  concurs  with  some  other  researcher's 
observations  [23]. 


Time  (ns) 

(b)  FP— 8318 

Fi*.  3.5  Applied  the  Modified  Gauss-Seidel  Method  to  the 
3-Stage  Ring  Oscillator. 

(Solid  Line  la  the  Exact  Solution,  Q  *  C  /C. .) 


In  [23],  Gear  studied  the  simulation  of  an  aircraft  which  was 
simplified  to  the  two-subsystem  model  as  shown  in  Figure  3.6. 

The  control  subsystem  reacts  very  rapidly  (being  electronic)  whereas 
the  flight  dynamic  subsystem  reacts  relatively  slowly,  being  a 
mechanical  change.  Because  the  dynamics  are  slow,  his  conclusion  was 
that,  there  can  be  very  little  coupling  from  the  control  to  the 
dynamics,  and  one  can  break  the  feedback  loop  from  the  dynamics  to 
the  control  and  handle  each  subsystem  separately.  Here  we  keep  * 
C2  »  1,  by  adjusting  g2  -  1  and  g2  *  1.0xl0~4  to  create  the  similar 
environment  as  in  [23]  and  achieve  the  same  conclusion. 

In  the  next  chapter,  the  waveform  relaxation  method  and  its 
convergence  and  stability  properties  will  be  discussed. 
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Fig.  3.6  Simplified  Simulation  Model. 


CHAPTER  4 


THE  WAVEFORM  RELAXATION  METHOD 
AND  ITS  NUMERICAL  PROPERTIES 


In  this  chapter,  we  will  discuss  a  second  avenue  by  which  the 
relaxation  of  nonlinear  systems  has  been  approached.  While  the  first 
approach,  presented  in  Chapter  3,  decomposes  the  system  into  several 
subsystems  at  the  level  of  the  difference  equations,  the  second 
decomposes  the  system  at  the  level  of  the  ordinary  differential  equa¬ 
tions.  This  alternative  relaxation  method  at  the  differential  equa¬ 
tion  level  must  deal  with  elements  in  function  spaces,  i.e.» 
waveforms.  Thus,  it  is  classified  as  the  waveform  relaxation  method. 
This  approach  began  with  the  work  which  led  to  the  RELAX  program  [9] . 
Either  the  Gauss-Jacobi  method  or  the  Gauss-Seidel  method  could  be 
used  in  the  waveform  relaxation  algorithm. 

A  brief  aiathematic  description  of  the  waveform  relaxation  method 
together  with  its  convergence  and  stability  properties  will  be  dis¬ 
cussed  in  this  chapter. 

4,.i  Mathematical  Formulation 

Let  us  recall  the  set  of  algebraic-differential  equations  in 
Equation  (3.1)  and  Equation  (3.2)  as  follows: 


where  x  s  Rp  is  the  unknown  ▼triable  at  tine  t  with  the  given  ini¬ 


tial  value  x  ;  z  is  the  tiae  derivative  of  z  at  tiae  t;  a  e  Rr  is 
o 

the  veotor  of  all  the  inputs  and  their  tiae  derivatives; 

f  :  R®  x  Rp  x  Rr  — >  RP  is  a  continuous  function*  and  E  s  R11*®,  n  ±  p 
is  a  aatrix  of  rank  n  such  that  E(x(t))  is  the  state  of  the  systea  at 
tiae  t.  Let  {  tj  •  i  =  0,1*. ..»N  }  denote  a  sequence  of  increasing 
tiae  points  selected  by  the  siaulator  with  tQ  ■>  0  and  t{|  “  T,  where  T 
is  the  given  siaulation  tiae  interval. 

The  general  fraae  of  the  wavefom  relaxation  algoritha  con¬ 
sists  of  two  aajor  processes,  naaely*  the  assignaent-partition  pro¬ 
cess  and  the  relaxation  process.  The  dynaaic  systea  is  decoupled  into 
certain  subsysteas  through  the  first  process*  while  the  second  one 
yields  the  wavefora  of  each  subsystea  at  each  iteration. 

£.1.1  The  Assixaaent-Partitloninx  Process 

In  this  section,  we  describe  the  first  process  of  the  wavefora 
relaxation  aethod,  that  is,  the  assignaent-partitioning  process,  to 
decouple  the  systea  of  nonlinear  algebraic-differential  equations 
into  subsysteas.  In  the  assignaent-partition  process,  each  unknown 
variable  is  assigned  to  an  equation  of  (4.1)  in  which  it  is  involved. 
However,  no  two  variables  can  be  assigned  to  the  saae  equation; 
therefore.  Equation  (4.1)  is  partitioned  into  a  disjoint  subsystems 
as  follows  : 


s 


i 
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JE4WVV 


£l<  d1#  u  )  -  0 


(4.3) 


*B(  xm,  xa,  da,n  )  «  0 


E(  x(0)“  xq  )  m  o 


(4.4) 


where  for  each  i  ■  1.  2,  . ..,a  x^  8  r  *  is  the  subveotor  of  unknown 
variables  assigned  to  the  i-th  partitioned  subsystem  and 


di  *  (  xl*  **'  xi-l*  xi+l*  *a 


(4.5) 


*1'  *•'  xi-l'  xi+l' 


vT 


In  the  i-th  snbsystea  f^,  and  Xj  ,  i  j  are  called  vector 
endogenous  and  exogenous  variables,  respectively.  If  we  treat  the 
vectors  d^,  i  »  1,  2,  ...  a  as  inputs,  then  (4.3)  can  be  solved  by 
solving  a  independent  subsystems.  In  other  words,  the  systea  has  been 
decoupled  into  a  subsysteas,  and  dj'*  are  called  the  decoupling  vec¬ 
tors  of  the  systea.  The  second  process,  relaxation  process,  will  be 
introduced  in  the  next  section. 


14‘1  Relaxation  ggggm 


The  relaxation  process  starts  with  an  initial  guess  of  the 
waveform  solutions  of  (4.1)  in  order  to  initialize  the  approximated 
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waveforms  of  the  decoupling  vector*.  It's  an  iteration  process;  dur¬ 
ing  each  iteration,  each  deeosiposed  subsystem  x^  is  solved  for  its 
endogenous  variables  in  the  entire  time  evolution  [  0,  T  ]  by  using 
the  approximated  waveform  of  its  decoupling  vector.  The  iterative 
process  is  carried  out  repeatedly  until  satisfactory  convergence  is 
achieved. 

The  actual  implementation  of  the  Waveform  relaxation  algorithm 
can  be  described  as  follows: 


i)  Assignment-partition  process 

Assign  the  unknown  variables  to  equations  in  Equation  (4.1) 
and  partition  Equation  (4.1)  into  m  subsystems  of  equations  as  by 
Equation  (4.3). 


of  the  relaxation  process 


Set  k  =*  1  and  guess  an  initial  waveform  (  x°(t);  ts[0,  T]); 

the  typical  guess  is  x°(t)  -  x(0)  for  all  t  s  [  0,  TJ. 

3.)  Analysis  of  the  decomposed  system  at  the  k-th  iteration 


i)  For  the  Gauss-Jacobi  Waveform  Relaxation  method 


for  each  i*  i,  2,  ...  m,  set 


5W 


t  _k-l  _k-l  _fc-l  _k-l 

1  X1  '  xi-l'  xi+l»  **»  xn  ' 

;t-i  :t-i  ;k-i  ik-i  \T 

xi  '  *•*  xi-i*  xi+i***  x*  ’ 


(4.6) 


ii)  For  the  Gauss- Sol del  Wave for*  Relaxation  method 

for  eaok  i  ■  i,  2,  ...  m,  aet 

,k  .  /  _k  _k  _k-l  _k-l 

di  1  xr  xi-l  *  xi+l  »  *  » 

(4.7) 

Ik  *k  *k-l 

With  the  decoupling  vector  d^*,  aolve  for  Ujk(t)  ; 

t  a  [  0,  T])  from  Equation  (4.3). 

£)  Iteration  loon 

Set  k  *  k  +  1  and  go  to  3) .  The  iteration  procesa  atopa  when 

the  difference  between  (xk(t);  te[  0,  T])  and  (xk-1(t);  te[  0,  T]), 

i.e.,  aax  ||  *.(t)  -  xv.(t)  II.  ia  sufficiently  snail, 
te  [O.TJ  x  ^  1 

In  the  next  section,  we  will  describe  the  modified  window 
waveforn  relaxation  method.  From  a  test  circnit,  it  ia  shown  that  the 
modified  technique  requires  less  CPU  time  and  memory  storage. 
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1.2  Jhe  Modified  Tavefora  Relaxation  Method  ,§s4  Ill 
Numerical  Prooertiee 

At  each  iteration,  the  wavefom  relaxation  method  decomposes 
the  systea  into  several  subsystems  each  of  which  is  analysed  for  the 
entire  given  tiae  interval.  The  accuracy  and  convergence  properties 
for  the  wavefom  relaxation  aethod  have  been  studied  in  [9];  however, 
the  huge  aeaory  storage  needed  to  store  the  waveforas  of  each  subsys- 
tea  in  the  entire  tiae  evolution  [  0,  T  ]  can  be  expensive  for  large 
systeas. 


m 


However,  in  soae  circnits  the  iterates  oscillate  about  the  solu¬ 
tion.  In  these  cases,  at  each  iteration  only  a  specific  part  of  the 
wavefom  in  each  subsystea  can  be  useful  in  analyzing  other  subsys¬ 
tems.  Under  this  condition,  sweeping  through  the  entire  tiae  evolu¬ 
tion  [  0,  T  ]  is  scaewhat  of  a  waste  in  either  CPU  tiae  or  aeaory 
storage.  For  example,  for  a  three-stage  ring  oscillator  in  Figure 

4.1,  Figure  4.2  contains  the  solution  waveforas  of  the  circuit  after 
each  iteration.  For  the  wavefom  of  the  first  iteration  in  Figure 

4.2,  we  can  find  that  the  information  beyond  t^  is  quite  different 
from  that  of  the  actual  wavefom,  hence  it's  meaningless  using  the 
wavefom  beyond  t^  to  analyze  other  snbsysteas. 

In  the  next  section,  we  use  the  circuit  in  Figure  4.1  as  an 
exaaple  to  illustrate  the  basic  concept  of  the  modified  wavefom 
relaxation  algorithm. 
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(•)  A  JOS  Ring  Oscillator 


4.  «2.1  The  Modified  Wave  fora  Relaxation  Method 

For  the  circuit  in  Figure  4.1*  if  we  cut  the  entire  time  evolu¬ 
tion  [  0,  T  ]  into  n  +  1  time  'Windows",  i.e.,  [  0,  tj  ] ,  [  tj,  tj  ], 

• • »  I  tn,  T  ] ,  and*  instead  of  sweeping  the  iterations  in  [  0,  T  ] , 
we  sweep  the  iterations  in  each  tine  'Window".  In  other  words,  after 
reaching  the  convergent  wavefom  in  window  1  (i.e.,  [  0,  t^  ])  then 
process  the  iteration  sweep  in  window  2  (i.e.,[  tj,  t2  ]),  ...,  even¬ 
tually  obtaining  the  wavefoms  in  each  tine  window.  Concatenating 
these  wavefoms  yields  the  wavefom  in  the  entire  tiae  evolution  [  0, 
T  ].  This  is  the  basic  idea  of  the  au>dified  window  wavefom  relaxa¬ 
tion  aethod  [12] . 

The  fundaaental  algorithm  of  the  original  Ganss-Seidel  Vavefom 
relaxation  aethod  in  [9]  can  be  described  as  follows  : 

The  Ganss-Seidel  Wavefom  relaxation  alaoritha 
BEGIN 

x  *  [  Voltages,  Currents  ] 
n  *  1 

WHILE  (  6n  <  Tolerance  )  DO 
BEGIN 

TOR  subsystem  i,  i  -  1  TO  a  DO 
BEGIN 

FOR  tiae  t  *  0  TO  t  ■>  T  DO 
BEGIN 


Solve  nonlinear  equations 


ln+1  _  f  /  n+1  _n+l  _n*-l  _n  _n> 

i  ri(xl  '•**xi-l*xi  ,xi+l*“*V 


xf^O)  -  x“(0)  -  x 
1  io 


END  {  sweep  ■  subsystems  } 


ft®4-1  -  max  „x  ||  x^tt)  -  xn(t)  II 
i  t 


n  »  n  +  1 


END  (  waveform  iteration  loop  } 


While  the  Modified  Window  Waveform  relaxation  method  is  of  the 


following  form  : 


The  Modified  Window  Waveform  relaxation  slyor lth» 


BEGIN 


x  “  [  Voltages,  Currents  ] 
w  ■  [  Windows  ] 


j  “  1 


BEGIN 


FOR  window  j ,  j  -  1  TO  k  DO 


BEGIN 


FOR  subsystem  i,  i  -  1  TO  a  DO 


WHILE  (  6n  <  Tolerance  )  DO 


BEGIN 

FOR  tine  t  «  0  TO  t  *  t ^  DO 
BEGIN 

Solve  nonlinear  eqnationa 


if! 


r  f  n+ 1 
fi(xl 


„u+l  _n+l  _n 
,xi-l'xi  ,x 


i+1 *  *  *  *  at 


x“) 


and 


xf  X(0)  -  xj(0>  =  xo 
END 

END  {  sweep  m  subsystems  } 

Sn+1  -  oaxaiax  II  xn+1(t)  -  xn(t)  II 
i  t 


n  »  n  +  1 

END  {  waveform  iteration  loop  } 
j  -  j  +  1 

END  {  window  process  } 

END 


For  the  ring  oscillator  circuit.  Figure  4.2a  and  Figure  4.2b 
show  the  waveforms  after  each  iteration  taking  the  whole  time  inter¬ 
val  [  0,  T  ]  as  the  time  window,  whereas  Figure  4.3a  and  Figure  4.3b 
show  the  waveforms  for  taking  the  window  size  half  of  the  entire  time 


interval  [  0,  T  ].  From  Figure  4.2a  and  Figure  4.2b  ve  find  that  the 
number  of  iterationa  inereaaea  aa  the  coupling  capacitors  increase. 
In  Figure  4.2a.  when  there  is  no  coupling  capacitors,  it  takes  4 
iterationa  to  achieve  convergence.  In  Figure  4.2b  with  the  introduc¬ 
ing  of  the  coupling  capacitors,  it  takes  10  iterations  to  achieve 
convergence. 

Comparing  the  waveforms  in  Figure  4.2b  with  the  waveforms  in 
Figure  4.3b,  one  can  find  the  effect  of  the  window  sizes  on  the 
waveforms.  In  Figure  4.2b  the  converged  part  of  the  waveforms 
increases  (propagates)  about  a  half  cycle  per  iteration,  while  in 
Figure  4.3b  it  increases  (propagates)  about  one  cycle  per  iteration 
and,  hence,  needs  a  fewer  number  of  iterations  than  in  Figure  4.2b  to 
achieve  convergence  for  the  entire  time  interval.  It  is  found  that 
in  Figure  4.3b  it  takes  6  iterations  to  achieve  convergence,  while  it 
takes  10  iterations  to  achieve  convergence  in  Figure  4.2b. 

During  the  iterations,  the  waveforms  oscillate  about  the  exact 
solution.  Table  4.1  shows  the  results  for  different  sizes  of  the  time 
window  under  different  degrees  of  coupling  capacitance. 


Table  14 

r  of  Iterations  in  each  window  to  reach  converse 


window  size 


*  (Q  is  the  ratio  of  floating  capacitance  to  the  ground  capacitance) 
**  Use  fixed  tiae  step  h  •  T/SOO  in  each  window. 


*1 

Ml 


It  is  easy  to  conclude  that  the  Modified  window  waveform  relaxa¬ 
tion  method  achieves  more  accurate  results  with  less  memory  storage 
and  less  CPTJ  tiae.  The  problem  that  arises  with  the  introduction  of 
the  modified  waveform  relaxation  algorithm  is  how  to  choose  the  loca¬ 
tion  and  size  of  the  windows.  In  other  words,  how  should  the  tiae 


interval  be  cut  into  several  subintervals  dynamically!  This  is  an 
interesting  open  topic.  In  the  next  section  we  study  the  convergence 
properties  of  the  waveform  relaxation  methods  as  a  function  of  the 
coupling  and  window  size. 


£.2.2  The  Numerical  Properties  of  The  Waveform  Relaxation  Me thod 

Mathematically,  the  convergence  of  the  waveform  relaxation 
method  has  been  discnssed  in  T9] .  For  an  important  class  of  dynamic 
systems,  MOS  digital  integrated  circuits,  it  is  concluded  [9]  that 
for  IDS  circuits  with  a  grounded  capacitor  at  each  node  convergence 
is  guaranteed  for  any  arbitrary  piecewise  continuous  set  of  initial 
waveforms  for  the  node  voltage  of  the  circuit. 

In  this  research,  we  would  like  to  study  the  numerical  proper¬ 
ties  of  the  waveform  relaxation  algorithm  from  a  different  point  of 
view:  the  effect  of  the  system  stiffness  on  this  algorithm.  Practi¬ 
cally,.  in  order  to  investigate  the  numerical  properties  of  the  modi¬ 
fied  waveform  relaxation  method  ,  we  used  the  same  linear  model  of 
the  IDS  inverter  as  shown  in  Figure  3.1  to  see  the  effect  of  stiff¬ 
ness  on  the  transient  analysis  of  MOS  circuits.  The  implementation 
of  the  waveform  relaxation  algorithm  to  this  linear  test  circuit  is 
shown  in  Figure  4.4.  In  Figure  4.4a,  at  the  (k+l)-th  iteration,  the 
node  voltage  vjj^Ct)  is  determined  with  the  node  voltage  Vjft)  frozen 
at  v£(t)  which  is  the  waveform  at  node  #2  at  k-th  iteration.  Once 
v^^Ct)  is  found,  we  go  to  Figure  4.4b  and  place  a  grounded  voltage 
source  v*+*(t)  at  no,*e  ^  sa<*  *°lv®  for  v£+*(t),  then  repeat  the 
iteration  until  convergence  is  achieved. 

If  the  backward  Euler  formula  is  used,  the  iterative  procedures 
are  described  by  following  equations: 


<  ^  ♦•!>  ^  -  £  <  'T.l  -  > 

a  an 


Cc  v 

■  i;  ’i.mi  ■  vw 


(4.8) 


< «2*^»  «.  **.»  -  ?  <i 

a  a 


(4.9) 


+  ?  <  if"1  -  1  >  -  T»  T**L,  -  o 

k  1  »n  2 1&  li  1  #  tt+1 

n  li 


where  h^  i*  the  size  of  the  tiae  step  sad  Tj  is  the  voltage  of 
node  #1  zt  the  (k+l)-th  iteratioa  st  (n+l)-th  tiae  poiat.  Equation 
(4.8)  sad  Equation  (4.9)  sre  solved  iteratively  aatil  convergence  is 


achieved. 


The  Mdified  wavefora  relaxation  aethod  solves  (4.8)  aad  (4.9) 
ia  each  tine  window  aad  concatenates  the  resalts  yielding  the 
wavefora  ia  the  entire  tine  interval.  Table  4.2  shows  the  repaired 
aaaber  of  iterations  in  each  wiadow  to  get  a  convergent  solatioa. 
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T*ble  4.2 


A 

A 

rt 
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1>I 


>a 
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Cc  g^  x2m~*‘2  ^2^1  Node  #*  eq. 


01  1000 


4.3  0.70 


12.1  0.099 


a*:greater  then  300 
b*:greater  than  600 
c*:greater  than  1000 
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The  results  in  Table  4.2  were  obtained  with  a  fixed  step  size  h 
0.1  s.  However,  froa  Figure  4.3  we  find  that  siailar  results  were 


obtained  with  h  •  0.01  s  and  h  *  0.001  s,  because  in  stiff  systeas 
the  convergence  rate  of  the  wavefom  relaxation  aethod  depends  on  the 
window  size,  that  is,  the  tiae  inoreaent  over  which  the  iteration  is 
perforaed.  Figure  4.6  graphically  illustrates  the  convergence  charac¬ 
teristics  of  the  wavefora  relaxation  aethod  for  a  given  stiff  systea. 
For  exaaple,  in  the  case  ■  0.01  F  and  gB  »  1000  S,  the  convergence 
rate  as  a  function  of  the  window  size  is  shown  in  Table  4.3  for  a 
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fixed  step  size  h  ■  0.1  s. 


Number  of  Iterations  to  Converge 


0  1:  r  *  140,  h  «  0.1 
•  2:  r  -  t40,  h  -  0.01 
A  3:  r  *  122,  h  ■  0.1 
□  4:  r »  122,  h- 0.01 
A  5:  r  *  63.3,  h  *  0.1 
7  6:  r  *  63.3,  h  »  0.01 
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Fig,  4.5  forefoot  Relaxation  Method  Converge*  as  »  Function 
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Table  4.3 


Thus,  in  the  first  esse  the  waveforms  were  relaxed  over  the 
entire  4  s  interval  and  over  600  iterations  were  required  for  the 
iterates  to  converge  to  the  solution.  When  the  waveforms  were  parti¬ 
tioned  into  four  windows,  200  iterations  in  each  window  were  required 
to  achieve  convergence.  Finally,  Table  4.3  shows  that  rapid  conver¬ 
gence  of  the  waveform  relaxation  method  was  not  achieved  until  the 
waveforms  were  partitioned  into  forty  0.1  s  windows.  In  this  case, 
the  window  size  (0.1  s)  is  approximately  limited  to  the  smallest  time 
constants  (0.085  s)  to  achieve  reasonable  convergence  speed. 

4 -2-1  2hfi  Wave form  Relaxation  Method  Applied  to  the  State  Equations 

The  equations  for  the  test  circuit  in  Figure  3.1  can  be  rewrit¬ 
ten  in  the  following  state  equation  format  assuming  i,(t)  -  0, 


A<1,1)  -  t  -<C2  +  cc)tl  -  Ccf-  ]/DE 
A(l,2)  -  (  -Cog2  ]/DE 
A(2,l)  -  I  “(Cj  +  -  Ccgj  ]/DE 

A(2»2)  -  [  “(C2  +  Cc)g2  ]/DE 

and 

DE  -  Ctc2  +  Cc  (  Cx  +  Cj  ) 

Applying  the  wavefota  relaxation  aethod  to  Equation  (4.10)  yielda 
v^1  -  A(l.l)  y*+1  +  A(l,2)  y* 

^+1  -  A(2,l)  y*+1  +  A(2.2>  y*1 
Applying  the  backward  Euler  f  omul  a  to  Equation  (4.11)  yielda 


1/Aa-A(l,l)  0 

k+1 

vl,n+l 

fc+1  k 

vl,n/hn  +  A(1'2)v2.n+1 

-A(2,l)  1/h -A(2,2) 

k+1 

SI 

y*1.  /h_ 

69 


Solving  Equation  (4.12)  gives  the  results  in  the  last  colon  of 
Table  4.2.  It  is  found  that  if  we  process  the  systea  frost  the 
viewpoint  of  the  state  equation  instead  of  the  node  equation,  then 
the  wavefom  relaxation  technique  is  not  affected  by  the  stiffness  of 
the  systea.  Unfortunately,  in  a  practical  iapleaentation,  it  is  hard 
to  fomulate  a  large  systea  on  the  coaputer  in  the  fora  of  the  state 
equation. 

1.1  Cone lud ins  Renarks 

Table  4.2  shows  that  the  wavefora  relaxation  aethod  does  not 
work  well  for  the  analysis  of  linear  stiff  systeas.  The  waveforas  of 
each  iteration  oscillate  about  the  solution,  and  an  excessive  nuaber 
of  iterations  are  required  for  convergence  [21,22],  unless  the  tiae 
windows  are  approxiaately  the  size  of  the  saallest  tiae  constant  of 
the  systea.  Since  the  tiae  window  is  liaited  to  the  size  of  the  saal- 
lest  tiae  constant,  in  order  to  achieve  convergence  in  a  few  itera¬ 
tions,  an  excessive  nuaber  of  tiae  windows  are  required  to  cover  one 
siaple  transition  of  the  wavefora. 

The  accuracy  and  convergence  properties  for  the  wavefora  relax¬ 
ation  aethod  have  been  studied  in  [9].  In  this  chapter,  we  have  shown 
that,  in  the  aodified  wavefora  relaxation  aethod,  the  flexibility  of 
setting  the  tiae  window  saves  both  the  coaputation  tiae  and  the 
aeaory  storage.  Although  the  convergence  of  the  original  wavefora 
relaxation  aethod  is  guaranteed  for  any  arbitrary  piecewise  continu¬ 
ous  set  of  initial  waveforas  [9],  its  convergence  is  slow  in  stiff 


CHAPTER  5 


EVENT-DRIVEN  AND  LATENCY  SCHEMES 

Previous  works  such  us  MOTIS-C,  SPLICE  and  RELAX  showed  that 
considerable  improvement  in  speed  can  be  achieved  with  the  exploita¬ 
tion  of  relaxation  techniques.  But  they  were  limited  to  the  simula¬ 
tion  of  MDS  circuits.  A  dominant  factor  which  yields  the  performance 
improvement  for  these  simulators  is  the  use  of  simplified  device 
models.  Hence,  in  this  study  we  investigate  the  performance  of  the 
time-point  relaxation  method  for  the  simulation  of  both  MOS  and  bipo¬ 
lar  digital  circuit  technologies.  Performance  is  also  investigated  as 
a  function  of  the  device  model. 

The  time-point  relaxation  method  is  implemented  in  the  general 
purpose  circuit  simulator  SLATE  [8],  and  contrary  to  other  relaxation 
simulators,  such  as  MOTTS,  SPLICE,  and  RELAX,  accurate  analytical 
device  models  are  included  in  the  simulator. 

1 4  Brief  Review  of  the  SLATE  Prosram 

The  tearing  approach  [8]  decomposes  a  system  into  certain  sub¬ 
systems.  There  are  two  tearing  methods,  namely,  the  node  tearing 
decomposition  and  the  branch  tearing  decomposition.  The  former  is 
preferred  [8]  and  is  implemented  in  the  SLATE  program.  As  shown  in 
Figure  2.3,  the  entire  circuit  is  decomposed  into  several  subcix- 


tion  aatrix  equation  as  follows: 


(5.2) 


where 

Ttt  "  Ytt  Ytsi(Tsi)  lT.ti 

and 

Tts  “  Jts  Ytsi(Ysi>  ljssi 

Equation  (5.2)  is  solved  to  obtain  eolations  for  vt  *nd  v  ,  and  next 
each  sobcircnit  can  be  solved  by  using  backward  substitution.  For 
exaaple,  the  resulting  aodified  nodal  equation  in  SLATE  for  the  cir¬ 
cuit  in  Figure  5.1  yields: 


systems  and  requires  s  large  amount  of  memory,  whereas  the  time-point 
relaxation  technique  only  s offers  the  former  weakness.  Thus,  it  was 
concluded  that  one  is  better  off  nsing  the  time-point  relaxation 
method  in  a  special  purpose  circuit  simulator  for  digital  circuits. 

In  the  next  chapter  we  describe  how  the  SLATE  program  [8]  was 
modified  to  include  time-point  relaxation.  Then  its  performance  for 
the  transient  analysis  of  digital  circuits  implemented  in  various 
technologies  is  given. 
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cuits.  The  nodes  of  each  subcircuit  can  be  classified  as  the  internal 
nodes  and  the  external  (tearing)  nodes.  After  partitioning  (which  is 
done  by  the  nser) ,  the  initial  step  in  the  solution  strategy  of  the 
node  tearing  method  it  to  reorder  the  nodes  snoh  that  all  the  tearing 
nodes  are  located  on  the  border  while  the  internal  nodes  for  each 
snbcircnit  are  located  on  the  diagonal  blocks.  The  resulting  modi¬ 
fied  nodal  equation  of  the  node  tearing  decomposition  from  [8]  is  as 
f ollows: 
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where  v  ^  is  the  set  of  the  internal  nodes  for  the  i-th  subcircuit, 
▼t  is  the  set  of  all  the  tearing  nodes,  and  vr  is  the  set  of  the 
nodes  for  the  'kest"  of  the  circuit  ('fcest"i  the  remaining  part  of 
the  circuit  after  all  the  suboircuits  are  removed). 


The  procedures  for  solving  the  equations  in  SLATE  are  first: 
eliminate  all  the  via  LU  factorization  to  get  the  interconnec¬ 
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We  use  the  same  circuit  ia  Figure  5.1  to  illustrate  the  parti¬ 
tioning  and  solution  procedures  in  the  inpleaentation  of  the  relaxa¬ 
tion  technique.  In  order  to  apply  the  modified  Gauss-Seidel 
approach.  Equation  (5.3)  is  partitioned  as  follows: 


142356789  10  11 


From  the  algorithm  of  the  modified  Gauss-Seidel  method,  v^  and 

*  .  • 

▼g  are  relaxed  by  using  a  predictor  v^  for  Vj,  and  a  predictor  vg  tot 
▼g;  then,  the  variables  V2>  Vg  and  Vj  of  the  first  subcircuit  are 
solved  from  the  3x3  matrix.  The  rest  of  the  subcircuits  are  solved 
one  at  a  time  sequentially  in  a  similar  manner.  In  order  to  use  the 
same  matrix  pointers  for  identical  subcircuits,  the  tearing  nodes  and 
the  DC  power  supply  nodes  are  split  in  program  implementation,  such 


die.  From  the  viewpoint  of  circuits,  the  topological  ordering  ie 
possible  only  for  one-way  circnits.  If  feedback  loops  exist  in  the 
oirenit.  the  directed  graph  6  is  no  longer  acyclic.  The  directed 
graph  6  is  said  to  have  strongly  connected  eoaponents  (SCC)  for  cir¬ 
cnits  with  feedback.  The  strongly  connected  eoaponents  of  the 
directed  graph  6  can  be  found  in  linear  tine  complexity  by  using 
Tarjan's  algorithm  [24]. 

For  the  circnits  with  feedback  loops,  basically,  there  are  two 
approaches  for  sequencing  the  vertices  in  the  directed  graph  6.  One 
is  to  contract  the  strongly  connected  components  into  one  new  suboir- 
cuit,  which  results  in  a  new  acyclic  directed  graph  6'  [25].  The 
problem  with  this  approach  is  that  the  size  of  the  subcircuits  after 
contraction  could  become  too  large  for  the  analysis  to  be  efficient. 
In  large-scale  circuit  simulation  one  should  always  try  to  keep  the 
size  of  the  suboircuits  small  to  make  the  analysis  time  linearly  pro¬ 
portional  to  the  size  of  the  entire  circuit.  The  other  approach 
processes  the  directed  graph  G  directly  without  any  contraction  by 
breaking  the  feedback  loops.  Predictors  are  used  for  the  decoupled 
terms. 

The  algorithm  used  in  SLATE- R  combines  the  above  two  approaches. 
First  of  all,  the  Tarjan’s  algorithm  is  implemented  to  detect  the 
strongly  connected  components  (SCC)  in  the  directed  graph  G  and  col¬ 
lapses  each  SCC  into  one  new  vertex  which  results  in  a  new  acyclic 
directed  graph  G’ .  Then  a  topological  ordering  is  chosen  as  the  order 
in  which  the  vertices  in  the  new  directed  graph  G'  are  processed  dur- 
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done  only  once.  The  other  benefit  for  using  the  tearing  method  is 
the  exploitation  of  latency.  During  the  analysis,  only  the  active 
parts  of  the  circuit  need  to  be  solved  and  this  rednces  the  computa- 
tional  time  considerably. 

However,  there  are  additional  features  that  could  be  added  to 
SLATE  to  shorten  the  simulation  time.  For  example,  no  event  schedul¬ 
ing  techniques  are  being  used  in  SLATE,  and  there  is  no  decoupling 
scheme  in  SLATE  so  that  the  entire  cirouit  matrix  must  be  inverted. 
The  idea  and  implementation  of  the  SLATE- R  program  (a  Simulator  with 
Latency  and  Tearing  — Relaxed  version)  are  shown  in  the  next  sec¬ 
tions. 

£.2  The  Relaxation  Technique 

The  first  procedure  of  the  relaxation  decomposition  is  the  same 
as  that  of  the  tearing  decomposition:  partition  the  circuit  into  sub¬ 
circuits.  Current  versions  of  those  simulators  which  use  the  decom¬ 
position  technique  partition  the  circuit  via  the  definition  of  sub- 
circuits  that  is  specified  by  the  user.  The  only  difference  in  the 
input  processes  of  the  circuit  file  between  SLATE  and  SLATE-R  is 
SLA1E-R  has  to  identify  the  fan-in  and  fanrout  nodes  for  each  subcir¬ 
cuit  while  SLATE  only  needs  to  reoord  the  externsl  nodes  no  matter  if 
they  are  fan-in  or  fan-out  nodes.  In  SLATE-R  the  circuit  input  file 
is  changed  such  that  it  is  easy  to  identify  the  fan-in  nodes  and 
fan-out  nodes.  In  the  next  chapter  we  describe  the  process  of  the 
cirouit  input  file  in  the  SLATE-R  program. 
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We  use  the  sue  circuit  in  Figure  S.l  to  illustrete  the  parti¬ 
tioning  and  solution  procedures  in  the  implementation  of  the  relaxa¬ 
tion  technique.  In  order  to  spply  the  modified  Gauss- Seidel 
approach.  Equation  (5.3)  is  partitioned  as  follows: 
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From  the  algorithm  of  the  modified  Gauss-Seidel  method,  and 
▼g  are  relaxed  by  using  a  predictor  Vy  for  Vj,  and  a  predictor  vg  for 
▼g ;  then,  the  variables  Vj,  Vg  and  Vg  of  the  first  subcircuit  are 
solved  from  the  3x3  matrix.  The  rest  of  the  subcircuits  are  solved 
one  at  a  time  sequentially  in  a  similar  manner.  In  order  to  use  the 
same  matrix  pointers  for  identical  subcircuits,  the  tearing  nodes  and 
the  DC  power  supply  nodes  are  split  in  progru  implementation,  such 


that  each  identical  subcircuit  has  the  sane  matrix  formation.  For  the 
example  circuit  in  Figure  5.1,  the  subcircuit  one  contains  nodes  1, 
2,  3,  4  and  5;  subcircuit  two  contains  nodes  1,  6,  7,  2  and  8,  with 
two  voltage  sources  (and  two  corresponding  current  variables)  in  each 
subcircuit,  the  relaxation  approach  needs  to  solve  three  7x7  subma¬ 
trices,  while  the  SLATE  program  is  to  solve  one  13x13  matrix. 

5..1  Event- Driven  Technique 

In  the  solution  strategy  of  the  relaxation  techniques,  once  the 
whole  circuit  has  been  partitioned  into  several  subcircuits,  the  next 
procedure,  which  is  called  the  event-driven  technique,  is  to  sequence 
the  subcircuits  to  be  simulated,  i.e.,  to  select  a  well-chosen  order¬ 
ing  in  which  the  subcircuits  are  to  be  processed.  In  our  research, 
event-driven  algorithms  will  be  implemented  on  the  basis  of  fan-in 
fan-out  topologies,  such  that  the  resulting  modified  nodal  equation 
is  in  the  form  of  Equation  (5.4)  with  the  internal  nodes  and  fan-out 
nodes  of  each  subcircuit  in  diagonal  blocks. 

A  cirouit  that  is  composed  of  unilateral  subcircuits  can  be 
represented  by  a  directed  graph  G(V,E),  where  each  vertex  in  V 
corresponds  to  each  subcircuit  and  each  edge  in  E  corresponds  to  each 
signal  line  from  fan-out  to  fan-in.  A  "gooff1  ordering  will  be  one  in 
which  a  subcircuit  is  processed  only  after  all  its  fan-in  subcircuits 
have  already  been  processed;  in  other  words,  the  event-driven  tech¬ 
nique  is  to  arrange  the  vertices  in  a  topological  order.  However,  the 
directed  graph  G  has  a  topological  ordering  if  and  only  if  it  is  acy- 


clio.  From  the  viewpoint  of  circnit*,  the  topological  ordering  is 
possible  only  for  one-way  circnits.  If  feedback  loops  exist  in  the 
circnit,  the  directed  graph  6  is  no  longer  acyclic.  The  directed 
graph  6  is  said  to  have  strongly  connected  components  (SCC)  for  cir¬ 
cnits  with  feedback.  The  strongly  connected  components  of  the 
directed  graph  6  can  be  found  in  linear  time  coaplexity  by  using 
Tarjan's  algorithm  [24]. 

For  the  circnits  with  feedback  loops,  basically,  there  are  two 
approaches  for  sequencing  the  vertices  in  the  directed  graph  6.  One 
is  to  contract  the  strongly  connected  components  into  one  new  subcir- 
cnit,  which  results  in  a  new  acyclic  directed  graph  G'  [25].  The 
problem  with  this  approach  is  that  the  size  of  the  subcircuits  after 
contraction  could  become  too  large  for  the  analysis  to  be  efficient. 
In  large-scale  circuit  simulation  one  should  always  try  to  keep  the 
size  of  the  subcircuits  small  to  make  the  analysis  time  linearly  pro¬ 
portional  to  the  size  of  the  entire  circuit.  The  other  approach 
processes  the  directed  graph  G  directly  without  any  contraction  by 
breaking  the  feedback  loops.  Predictors  are  used  for  the  decoupled 
terms. 

The  algorithm  used  in  SLATE-R  combines  the  above  two  approaches. 
First  of  all,  the  Tarjan's  algorithm  is  implemented  to  detect  the 
strongly  connected  components  (SCC)  in  the  directed  graph  G  and  col¬ 
lapses  each  SCC  into  one  new  vertex  which  results  in  a  new  acyclic 
directed  graph  G' .  Then  a  topological  ordering  is  chosen  as  the  order 
in  which  the  vertices  in  the  new  directed  graph  G'  are  processed  dur- 
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ing  each  iteration.  In  order  to  aet  np  the  order  in  vhich  the  subcir- 
onits  within  each  SCC  are  processed,  the  following  algorithm  [26]  is 
implemented  to  set  np  the  analysis  sequences.  In  order  to  describe 
the  algorithm,  the  following  notations  are  introduced: 

GgC0(V,E>:  the  directed  graph  of  each  SCC. 

adj(v):  set  of  adjacent  vertices  corresponds  to  the  set  of  ver¬ 
tices  with  an  edge  which  fans  out  to  vertex  v. 

la(v):  label  of  vertex  v. 

Procedure 

[1]  Set  lafv^)  m  o  for  each  vertex  v^  of  Ggcc<V,E) 

[2]  2.  la(v^)»i  for  each  vertex  Vj  which  corresponds  to  an  input 
signal  terminal. 

k-1. 

[3]  k-k+1 

Choose  a  vertex  v^  where  la(vj)»0  and  la(v^)^0  for  all  v^  s 
adj(Vj).  if  there  is  no  such  vertex,  choose  a  vertex  Vj  connect¬ 
ing  to  a  vertex  which  has  the  lowest  label. 

la(Vj)-k. 

[4]  Repeat  step  (3)  until  all  the  vertices  in  G.„(V,E)  are  labeled. 

It  is  claimed  in  [26]  that  the  above  algorithm  can  find  all 
feedback  loops  which  will  be  broken  during  the  analysis  sequencing. 


For  arbitrary  networks,  this  algorithm  nay  sot  be  satisfactory  in 
identifying  ainiaal  feedback  loops  as  other  coaplez  algorithms  do 
(27].  However#  becanse  the  accurate  device  aodels  are  need  in  SLATE, 


coupling  parameters  such  as  the  floating  capacitor  froa  drain  to  gate 
in  MOS  devices  and  the  terminal  reaistance  at  each  leg  of  the  bipolar 
transistor  devices  will  cause  the  subcircuits  to  have  a  certain 
degree  of  coupling.  In  this  ease,  our  solution  strategy  is  to  label 
the  vertices  in  sense  of  'tiepth- first  search'  approach  and  to  use  a 
predictor  to  cut  all  the  feedback  loops.  Another  fact  is  that  itera¬ 
tions  among  subcircuits  are  continued  until  convergence  is  reached. 
The  worst  case  for  'Improper"  ordering  is  at  most  the  necessity  to 
process  one  more  iteration  at  each  time  point.  General  algorithms 
are  not  cost  effective  because  the  complezity  grows  exponentially 
with  the  size  of  the  network  [27] .  For  these  two  reasons  we  imple¬ 
ment  the  simple  analysis  sequencing  algorithm  in  our  program.  In  the 
next  section,  we  will  describe  the  concept  and  experimental  results 
of  latency  exploitation. 

£•£  LillMSZ  Exploitation 

In  large-scale  circuit  analysis,  a  part  of  the  cirouit  may  be 
not  active  at  any  given  time  and  at  any  particular  iteration.  This 
phenomenon  is  called  latency  [28].  Exploitation  of  the  latency  can 
provide  savings  in  CPTJ  time.  In  the  program  SLATE,  the  latency  con¬ 
cept  is  implemented  at  three  levels:  (1)  device  level;  (2)  subcircuit 
level;  and  (3)  network  level. 
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At  the  device  level,  the  operating  point  of  each  nonlinear  dev¬ 
ice  is  aonitored.  If  the  operating  point  does  not  change  signifi¬ 
cantly  between  tiae  points  or  Newton-Raphson  iterations,  the  device 
aodels  need  not  be  reevaluated,  and  the  aatrix  entries  eoapnted  at 
the  previous  tiae  point  or  the  previous  iteration  are  used  again. 
This  level  of  latency  is  also  called  the  device  bypass  level  which  is 
used  in  SPICE2  and  SPLICE.  Because  the  node  tearing  aethod  is  iaple- 
uented  in  SLATE,  the  latency  strategy  can  be  used  in  the  network 
level  through  the  exploitation  of  the  substitution  theorea  in  the 
formulation  of  interconnections  [8] .  While  with  the  relaxation 
method,  there  is  no  bordered  interconnection  matrix,  in  the  SLATE- R 
program  we  only  consider  the  exploitation  of  latency  at  the  bypass 
level  and  subcircuit  level. 

The  relaxation  aethod  deals  with  each  subcircuit  individually, 
with  the  use  of  the  predictor  to  decouple  the  coupling  terms.  There¬ 
fore  the  exploitation  of  the  latency  can  be  iaplemented  at  the  sub¬ 
circuit  level,  either  in  the  Newton-Raphson  iteration  or  at  the  tiae 
point  level.  The  idea  is  that  if  there  is  no  significant  change 
between  Newton-Raphson  iterations  or  tiae  points,  then  the  latent 
subcircuit  can  be  skipped  in  the  analysis.  The  latent  status  of  a 
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The  Latency  Criterion 


For  the  snbcirenit  N^,  denote  the  fan-in  node  voltagea  as  y^k  , 

P 

P*1i2m,.,  and  the  internal  and  ontpnt  node  yoltagea  of  N^  as  yQk  , 

4 

4*1,2,..  .  The  following  latency  scheme  is  implemented  in  the 

SLATE- R  program. 


Latency  Scheme; 

A  snbcirenit  N  is  considered  to  be  latent  at  time  t_  if  the 

I  11 

following  two  conditions  are  satisfied: 
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If  the  snbcirenit  is  not  latent  in  time,  we  can  still  consider 
the  latency  in  Newton-Raphson  iterations.  A  snbcirenit  is  con¬ 
sidered  to  be  latent  at  time  tfi  during  the  ith  Newton-Raphson 


iterations  if  the  following  two  conditions  are  satisfied: 
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S,.±.2  Experimental  Eesnlts  for  Latency  Exploitation 

The  latency  scheme  described  in  the  last  section  has  been  suc¬ 
cessfully  implemented  into  the  relaxation  version  of  the  program 
SLATE.  Table  5.1  and  Table  5.2  give  the  simulation  data  corresponding 
to  the  circuits  shown  in  Figure  5.2  and  Figure  5.3  respectively.  In 
order  to  see  the  latency  exploitation  at  the  Newton-Raphson  iteration 
level,  here  the  dc  analysis  is  performed,  while  Table  5.3  gives  the 
simulation  data  for  the  circuit  shown  in  Figure  5.4  for  both  DC 
analysis  and  transient  analysis. 


Table  5.1  Sinulation  data  of  a  DC  analysis 
for  the  MOS  circuit  in  Figure  5 .2 


DC  Analysis 

With  Latency 

Without  Latency 

#  of  subcircuits 
tines  #  of 
iterations 

275 

275 

#  of  nonlatent 
subcircuits  tines 

#  of  iterations 

157 

Latency  exploitation  (%) 

42.91 

Total  CPU  tine  (sec) 

5.(3 

7.68 

Sayings  in  CPU 
tine  (%) 

26.69 

Table  5.2  Sinulation  data  of  a  DC  analysis 
for  the  MOS  circuit  in  Figure  5.3 


I 


DC  Analysis 


With  Latency  Vithont  Latency 


#  of  subcircuits 
tines  #  of 
iterations 


#  of  nonlatent 
subcircuits  tines 

#  of  iterations 


Latency  exploitation  (%) 


45.27 


Total  CPU  tine  (sec) 


11.87 


14.82 


Savings  in  CPU 
tine  (%) 


19.91 


A0-A161  854 A  STUOV  OF  RELAXATION  TECHNIQUES  FOR  THE  TRANSIENT  2/2 

ANAL VS IS  OF  DIGITAL  Cl  CU>  ILLINOIS  UNIV  CHAMPAIGN 
COGNITIVE  PS VCHOPHVS I OL OGV  LAB  U  CHIA  05  JUN  85 
UNCLASSIFIED  UILU-ENG-85-2205  N00014-84-C-0149  F/G  9/5  NL 


xLxazsiJinxLaard 


gggsrogga 


i\ra  UDJQi^^ 


F  P—8308 


® Block  Diagraa  of  Binary  to- Octal  Decoder. 


In  these  tables  we  find  that  the  savings  in  CPU  tiae  is  not  the 
saae  for  the  latency  exploitation.  For  exaaple,  in  Table  5.1*  for  the 
11-stage  ohain  of  inverters*  a  42.91%  latency  exploitation  was 
achieved  and  a  25.59%  savings  in  CFD  tiae  was  obtained*  becanse  soae 
of  the  CPU  tiae  has  to  be  spent  in  the  latency  cheeking  which  aakes 
the.  difference  between  the  percentage  latency  exploitation  and  the 
percentage  savings  in  the  OTJ  tiae.  In  Table  5.2,  for  the  Binary- 
to-Octal  decoder,  a  45.27%  latency  exploitation  was  achieved  and  a 
19.91%  savings  in  CPU  tiae  was  obtained,  becanse  the  latency  exploi¬ 
tation  only  connts  the  nnaber  of  those  latent  snbcircnits*  and  all 
the' snboirenits  are  preassnaed  to  have  the  saae  siae.  However*  the 
snboironits  aay  have  different  sizes  which  also  aake  the  difference 
between  the  latency  exploitation  and  the  CFEJ  tiae  savings.  Froa 
Table  5.3,  for  the  one-bit  fall  adder,  a  14.75%  savings  in  CPU  tiae 
was  achieved  in  DC  analysis  and  a  22.91%  savings  in  transient 
analysis  was  achieved,  becanse  in  transient  analysis  we  can  take  the 
advantages  of  the  latency  exploitation  in  the  Newton-Raphson  itera¬ 
tion  level  as  well  as  the  advantages  of  latency  exploitation  in  the 


tiae  level 


Table  5.3a  Simulation  data  of  a  DC  analysis 
for  the  MOS  eirenit  in  Figure  5.4 


DC  Analysis 

With  Latency 

Without  Latency 

#  of  anbeircnita 
times  #  of 
iterations 

150 

150 

#  of  nonlatent 
subcircuits  times 

#  of  iterations 

107 

Latency  exploitation  (%) 

28.(7 

Total  CPU  time  (sec) 

(.18 

7.25 

Sayings  in  CPU 
time  (%) 


14.75 


Table  5. 3b  Sinulation  data  of  a  transient  analysis 
for  the  MDS  cirenit  in  Fifnre  5.4 


Transient  Analysis 


With  Latency  Without  Latency 


#  of  snbeirenits 
tines  #  of 
iterations 


#  of  nonlatent 
snbeirenits  tines 

#  of  iterations 


Latency  exploitation  (%) 


Total  CPU  tine  (see) 


Savings  in  CPU 
tine  (%) 


28.92 


31.95 


22.91 


41 .45 


a  b 


In  Chapter  3,  the  linear  teat  eirenit  in  Figure  3.1  has  been 
need  to  study  the  numerical  properties  of  the  Gauss-Seidel  tech¬ 
niques.  This  test  circuit  was  generated  by  linearizing  the  model  of 
a  cascade  of  two  inverters  in  which  the  transistors  are  assumed  to  be 

actiwe.  By  changing  the  paraeters,  i.e.,  the  coupling  tea  C  and 

c 

the  signal  gain  g^,  we  can  adjust  the  degree  of  stiffness  in  our 
linear  test  circuit. 

In  practical  circuits,  the  coupling  teas  and  the  signal  gains 
are  technology-oriented.  In  this  section,  different  technologies 
such  as  NMDS,  CIOS  and  bipolar  junction  transistors  are  used  to 
inplenent  the  cascade  of  a  two  inverter  test  cirouit.  The  NHOS 
inverters  and  CMOS  inverters  are  shown  in  Figure  5.5  and  Figure  5.6, 
respectively.  Figure  5.7  shows  the  TIL  circuit.  Figure  5.8.  shows  the 
ECL  circuit.  Table  5.4,  Table  5.5,  Table  5.6  and  Table  5.7  show  the 
corresponding  simulation  data. 
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Fif.  5.5  NHDS  Circuits. 


The  input  file  for  the  circuit  in  Fig.  5.5 

s  11-stage  chain  inverter  circuit 
•inverter 

. subekt  inv  10  30  20  2 
••*  nodes:  vdd  input  ontpnt 
al  10  20  20  0  da  w-5  1-10 

+  as-25  ad«25  ass-15  asd— 20  cgs-1.725f  cgd*1.725f 

+  rdd-35  rss-35 

■2  20  30  0  0  ea  v=10  1-5 

+  as- 100  ad-100  ass-40  asd-35  cgs-3.45f  cgd-3.45f 

+  rdd-35  rss-35 

.ends 

*  noainal  circuit 
vdd  10  0  5 

vcl  9  0  pulse(0  5  0  2s  2n  125n  254n) 
xl  10  9  11  inv 

x2  10  11  12  inv 

.aodel  dn  xmos  v to— 2  kp-lOu  be-0.52  lsabda-0.05  kpn=0.0918f 
+  as-0.33  coz-0.345f 

.aodel  ea  asos  vto-1  kp-lOu  be-0.52  lsabds-0.05  kpn-0.0918f 
+  as-0.33  eox-0.345f 

.print  tran  v(9)  v(10)  v(ll)  v(12) 

. tran  ln,50n 
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Fig.  5.6  CKOS  Circuits. 


The  input  file  for  the  circuit  in  Fig.  5.6 

a  11-stege  chain  inverter  circuit 
•inverter 

.  subckt  inv  10  20  30 
***  nodes:  vdd  input  output 
ml  10  20  30  10  eml  v- 5  1-5 

+  as-25  ad*=25  ass>15  ssd=20  cgs-1.725f  cgd-1 .725f 

+  rdd=3  5  rss-35 

■2  30  20  0  0  cm2  u-12.5  1-5 

+  as-100  ad-100  ass-40  asd-35  cgs-3.45f  cgd-3.45f 

+  rdd-35  rss«35 

.ends 

•  nominal  circuit 
vdd  10  0  5 

vcl  9  0  pulse (5  0  0  30n  2n  60n  40n) 

*1  10  9  11  inv 

z2  10  11  12  inv 

.model  eml  pm  os  vto--l  kp-8u  be-0.52  lambda-0.05  kpn-0.0918f 
+  ms-0.33  cox-0. 345 f 

.model  cm2  mos  vto— 1  kp-20u  be-0.52  lambda-0. 05  kpn-0.0918f 
+  ms-0.33  cox-0. 345f 

.print  tran  v(9)  v(10)  v(ll)  v(12) 

. tran  ln.SOn 


The  input  file  for  the  circuit  in  Fig.  5.7 


Bipolar  TIL  inverter 

.  subckt  inv  19  8  2 

*  input  node(l)  output  node  (8) 

rl  9  2  4k 

r2  4  0  lk 

r3  9  5  1.4k 

r4  9  6  100 

ql  3  2  1  bjp 

q2  5  3  4  bjp 

q3  6  5  7  bjp 

q4  8  4  0  bjp 

dl  7  8  diode 

.  ends 

*1  2  9  3  inv 
x2  3  9  4  inv 

v2  2  0  pulse(0  5  0  30n  2n  6 On  40n) 
vdd  9  0  5 

.model  bjp  npn(bf-100  br*0.1  re”  100  vm-200  nc*l  ne=l 
+  cje“2p  cjc=2p  cc*»2p) 

.model  diode  d(rs”10) 

.print  tran  v(2)  v(3)  v(4) 

. tran  In. 3 On 
.end 


The  input  file  for  the  circuit  in  Fig.  5.8 


Bipolar  ECL  inverter 

. subckt  inv  7  8 

*  input  node (7)  output  node (8) 

rl  1  0  300 

r2  2  0  300 

r3  3  5  1.18k 

r4  5  8  1.5k 

q2  1  7  3  bjp 

q3  2  4  3  bjp 

q4  0  1  8  bjp 

vrr  4  0  -1.1 

vdd  5  0  -5.2 

.ends 

zl  2  3  inv 
z2  3  4  inv 

v2  2  0  exp (-1.5  1.5  0  30n  2n  60n  40n) 

.model  bjp  npnfof^lOO  br=0.1  rb=200  va-200  nc=l  ne=l 
+  c}v*2p  cjc*=2p  ccsB2p) 

.print  tran  v(2)  v(3)  v(4) 

.  tran  In,  3 On 
.  end 
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r  „• 


mis.  i.4a 

NMOS  inverters  with  signal  gain  *  9.5 
( SLATE- R) 


time 

tine  step 

conpling 

#  of  iterations 

0 

0 

0 

0 

11 

1 

.5000e-10 

.5000e-10 

.482  986-03 

3 

2 

.1000 e-09 

.5000e-10 

.482  9  86-03 

3 

3 

.3000e-09 

.20006-09 

.241496-03 

3 

4 

.1100e-08 

.8000e-09 

.603746-04 

4 

5 

.2000 e-08 

.9000e-09 

.536646-04 

5 

6 

•2050e-08 

.50006-10 

.482976-03 

3 

7 

.2100e-08 

.50006-10 

.965936-03 

3 

8 

.2300e-08 

.20006-09 

.241476-03 

3 

9 

,3100e-08 

.80006-09 

.603736-04 

4 

10 

.5100e-08 

.20006-08 

.234896-04 

6 

11 

.71006-08 

.20006-08 

.193796-04 

4 

12 

.91006-08 

.20006-08 

.184006-04 

4 

13 

.11106-08 

.20006-08 

.184006-04 

4 

14 

.13106-08 

.20006-08 

.18400e-04 

4 

15 

.15106-08 

.20006-08 

.184006-04 

4 

16 

.17106-08 

.20006-08 

.18400e-04 

4 

17 

.19106-08 

.20006-08 

.18400e-04 

4 

18 

.21106-07 

.20006—08 

.184006-04 

4 

19 

.23106-07 

.20006-08 

.184006-04 

4 

20 

.25106-07 

.20006-08 

.184006-04 

3 

21 

.27106-07 

.20006-08 

.18400e-04 

3 

22 

.29106-07 

.2000e-08 

.184006-04 

3 

23 

.31106-07 

.20006-08 

.184006-04 

3 

24 

.33106-07 

.20006-08 

.184006-04 

3 

25 

.35106-07 

.2000e-08 

.184006-04 

3 

26 

.47106-07 

.2000e-08 

.184006-04 

3 

27 

.39106-07 

.20006-08 

.184006-04 

3 

28 

.41106-07 

.20006-08 

.184006-04 

3 

29 

.43106-07 

.2000e-08 

.184006-04 

3 

30 

.45106-07 

.20006-08 

.184006-04 

3 

31 

.47106-07 

.20006-08 

.184006-04 

3 

32 

.49106-07 

,2000e-08 

.184006-04 

3 

33 

.50006-07 

.90006-09 

.408896-04 

3 

100 


C; 


Pi 

.  f 


Csj 

id 


'A 

□ 


\\ 

v> 


.  i 


fi 


.ihLAnJi  a_*.  -^A~w  »  *■  ■t.*  ^ i  A<LA.il  A 


.A  JsJfi  n.J5k  k_*l  4 
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Tab la  5.4fc 

NMOS  inverters  with  signal  gain  *  2..5, 
(SLATE) 


tisie 

time  step 

#  of  iterations 

0 

0 

0 

11 

1 

.5000 e-10 

.5000e-10 

3 

2 

.1000e-09 

.5000 e-10 

2 

3 

.3000 e-09 

.2000e-09 

2 

4 

.1100 e-08 

,8000e-09 

4 

5 

.2000 e-08 

•9000e-09 

5 

6 

.2050 e-08 

.5000e-10 

3 

7 

,2100e-08 

.5000 e-10 

2 

8 

.2300 e-08 

.2000e-09 

3 

9 

.3100C-08 

•8000e-09 

3 

10 

.5100e-08 

.2000e-08 

4 

11 

.7100 e-08 

.2000e-08 

3 

12 

.91Q0e-08 

.2000 e-08 

3 

13 

.1110e-08 

.2000e-08 

3 

14 

.1310e-08 

.2000 e-08 

3 

15 

.1510e-08 

.2000 e-08 

3 

16 

.1710e-08 

.2000e-08 

3 

17 

.1910e-08 

.2000 e-08 

3 

18 

.2110e-07 

.2000e-08 

3 

19 

.2310e-07 

.2000 e-08 

3 

20 

.2510e-07 

.2000e-08 

3 

21 

•2710e-07 

.2000e-08 

3 

22 

,2910e-07 

.2000e-08 

2 

23 

.3110e-07 

,2000e-08 

2 

24 

.3310e-07 

.2000e-08 

2 

25 

,3510e-07 

,2000e-08 

2 

26 

.4710e-07 

.2000e-08 

2 

27 

.3910e-07 

.2000e-08 

2 

28 

.4110e-07 

.2000 e-08 

2 

29 

,4310e-07 

.2000e-08 

2 

30 

.4510e-07 

.2000 e-08 

2 

31 

.4710e-07 

,2000e-08 

2 

32 

,4910e-07 

•2000e-08 

2 

33 

.5000e-07 

.9000 e-09 

2 

vt^vlv1,  i-  -V'-V,  LV&WiYs/i.'A’A'r' 
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Table  £.£a 

CMOS  invertere  with  signal  itin  *  11 .7 
(  SLATE- R) 


S 

«ta 

§ 

'It 

W 

:« 

* 


tine 

tiae  step 

conpl ing 

#  of  iterations 

0 

0 

0 

0 

13 

1 

.5000e-10 

.5000e-10 

.81705e-03 

5 

2 

.1000e-09 

.5000 e-10 

.81705e-03 

4 

3 

.3000 e-09 

,2000e-09 

.40537e-03 

4 

4 

.1100 e-08 

.8000e-09 

•83374e-04 

5 

5 

.3100 e-08 

.2000 e-08 

,33350e-04 

4 

6 

.5100e-08 

.2000 e-08 

•33350e-04 

4 

7 

.7100e-08 

.2000e-08 

,33350e-04 

4 

8 

.9100e-08 

.2000 e-08 

.33350e-04 

4 

9 

.1110e-07 

.2000e-08 

.33350e-04 

4 

10 

•1310e-07 

.2000 e-08 

.33350e-04 

4 

11 

,1510e-07 

.2000 e-08 

.33350e-04 

4 

12 

.1710e-07 

.2000e-08 

.33333e-04 

7 

13 

.1910e-07 

.2000e-08 

,3754Je-04 

7 

14 

.2110e-07 

.2000e-08 

.37662e-04 

5 

15 

•2310e-07 

.2 000 e-08 

.37662e-04 

3 

16 

.2510e-07 

.2000e-08 

,37662e-04 

5 

17 

,2710e-07 

,2000e-08 

,37662e-04 

3 

18 

.2910e-07 

.2000e-08 

,37662e-04 

4 

19 

.3000e-07 

.9000 e-09 

,83674e-04 

3 

20 

.3050e-07 

.5000e-10 

,75325e-03 

4 

21 

.3010e-07 

.5000 e-10 

,15065e-02 

4 

22 

,3030e-07 

.2000e-09 

.37662e-03 

4 

23 

.3110e-07 

.8000 e-09 

•94155e-04 

4 

24 

.3310e-07 

.2000e-08 

,37662e-04 

4 

25 

,3510e-07 

.2000e-08 

.37662e-04 

3 

26 

,4710e-07 

.2000 e-08 

,37662e-04 

3 

27 

,3910e-07 

.2000 e-08 

.37662e-04 

3 

28 

,4110e-07 

.2000e-08 

,37662e-04 

3 

29 

.4310e-07 

.2000e-08 

,37662e-04 

3 

30 

.4510e-07 

.2000 e-08 

,37662e-04 

3 

31 

,4710e-07 

.2000 e-08 

,37662e-04 

3 

32 

,4910e-07 

.2000 e-08 

,37662e-04 

2 

33 

•5000e-07 

.9000e-09 

,83694e-04 

3 

IHUJWUH  JtlW-JJIW" 
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Table  i.5J> 


ss 

OIOS  inverters  with  sisnsl  tain  *  11.7 

ft: 

(SLATE) 

B 

t’j 

ff* 

i5v 

tine 

tine  stop 

#  of  iterations 

0 

0 

0 

0 

13 

«■» 

l 

,5000e-10 

.5000 e-10 

2 

2 

.1000e-09 

.5000e-10 

2 

3 

. 3 000 e— 09 

.20000-09 

3 

4 

.1100 e-08 

.8000 e-09 

3 

' .'  * 

5 

.3100 e-08 

.2000o-08 

3 

i  6 

.5100e-08 

.2000 e-08 

3 

7 

•7100e-08 

.2000O-08 

3 

c% 

8 

.9100e-08 

.2000 e-08 

3 

<v 

V. 

9 

.1110e-07 

.2000O-08 

3 

10 

.1310e-07 

.2000 e-08 

3 

k\ 

11 

•1510e-07 

.2000O-08 

3 

i 

i  12 

•1710e-07 

.2000 e-08 

4 

■ 

1  13 

,1910e-07 

.2000e-08 

5 

14 

•2110e-07 

.2000e-08 

5 

s 

15 

.2310e-07 

.2000 e-08 

3 

16 

•2510e-07 

.2000e-08 

7 

17 

.2710e-07 

.2000 e-08 

3 

g 

1  18 

.2910e-07 

.2000e-08 

3 

B 

!  1» 

.3000e-07 

.9000 e-09 

3 

20 

.3050e-07 

.5000 e-10 

3 

*y 

21 

.3010O-07 

.5000O-10 

2 

i 

u  J 

22 

.3030O-07 

.2000o-09 

3 

23 

.31100-07 

.8000 e-09 

3 

24 

.33100-07 

.2000e-08 

3 

5 

!  25 

.3510o-07 

.20009-08 

3 

X 

26 

.47100-07 

.2000 e-08 

3 

27 

.39100-07 

.2000 e-08 

2 

a  ' 
.  * 

28 

.41100-07 

.20009-08 

2 

fc"*^ 

%*A 

29 

.43100-07 

.2000e-08 

2 

30 

.4510e-07 

.2000e-08 

2 

.-a 

l  31 

.4710o-07 

.2000e-08 

2 

)  32 

.49100-07 

.2000 e-08 

2 

i  *( 

33 

.5000 e-07 

.9000o-09 

2 

.  v 

■A 

■A 


cJ 


* 


£ 


,-: 


3 


a 

ft 
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mu  i-iA 

3H  inwortera  with  ££“100 
( SLATE- R) 


tiae 

tiae  atep 

eonpl ing 

#  of  iterations 

0 

0 

0 

. 10000 e-01 

68 

1 

.5000e-10 

.5000 e-10 

.10000 e-01 

4 

2 

.1000e-09 

•3000e-10 

.10000 e-01 

4 

3 

•3000e-09 

.2000e-09 

.10000 e-01 

4 

4 

.1100 e-08 

.8000©- 09 

.10000e-01 

3 

5 

,3100e-08 

.2000 e-08 

•lOOOOe-Ol 

8 

6 

.3100O-08 

.2000e-08 

.10000 e-01 

5 

7 

.7100e-08 

.2000e-08 

.10000 e-01 

6 

8 

•9100e-08 

.2 000 e-08 

.10000e-01 

7 

9 

.1110e-07 

.2000 e-08 

.10000 e-01 

8 

10 

,1310e-07 

.2000 e-08 

.10000e-01 

9 

11 

•1310e-07 

.2000 e-08 

.10000e-01 

11 

12 

.1710e-07 

.2000e-08 

.10000 e-01 

10 

13 

,1910e-07 

.2000e-08 

.10000e-01 

8 

14 

.2110e-07 

.2000e-08 

.10000 e-01 

8 

13 

.2310e-07 

.2000 e-08 

.10000e-01 

8 

16 

.2510e-07 

.2000e-08 

.10000 e-01 

9 

17 

•2710e-07 

.2000 e-08 

.10000 e-01 

7 

18 

.2910e-07 

.2000e-08 

.10000 e-01 

9 

19 

.3000e-07 

,9000e-09 

.10000 e-01 

7 

The  coupling  it  the  conductance  of  the  eaitter  resistance  re=100. 


Table  £.££ 

HL  inverters  wltb  rg-100 
(SLATE) 


I 


tiae  tiae  step  #  of  iterations 


0 

0 

0 

1 

.5000 e-10 

.50008-10 

2 

.1000 e-09 

.50008-10 

3 

.3000e-09 

.20008-09 

4 

.1100e-08 

.80008-09 

5 

.3100e-08 

.20008-08 

6 

.5100e-08 

.20008-08 

7 

•7100e-08 

.20008-08 

8 

.9100 e-08 

.20008-08 

9 

.1110e-07 

.20008-08 

10 

,1310e-07 

.20008-08 

11 

.1510e-07 

.20008-08 

12 

.1710e-07 

.20008-08 

13 

.19108-07 

.20008-08 

14 

.21108-07 

.20008-08 

15 

.23108-07 

.20008-08 

1« 

.25108-07 

.20008-08 

17 

.27108-07 

.20008-08 

18 

.29108-07 

.20008-08 

19 

.30008-07 

.90008-09 

4 

3 

4 

5 
8 
3 
7 

5 

13 

6 
11 
12 

6 

6 

7 

9 

7 

8 
5 


Tabid  5.6c 

TIL  inverter*  with  re-10 
( SLATE- R) 


tine  tine  step  coupling  #  of  iterations 

0  0  0  . 10000 e-01  40 

1  .5000 e-10  ,5000e-10  .10000e-00  5 

2  .1000 e-09  .5000 e-10  .10000e-00  5 

3  .3000 e-09  .2000e-09  .10000e-00  5 

4  .1100 e-08  ,8000e-09  .10000e-00  7 

5  .3100e-08  .2000e-08  .10000e-00  7 

6  .5100 e-08  .2000e-08  .10000e-00  7 

7  .7100e-08  .2000e-08  .10000e-00  7 

8  .7350e-08  .2500e-09  .10000e-00  7  (*) 

9  .8350e-07  .1000e-08  .10000e-00  29 

10  .1035e-07  .2000e-08  .10000e-00  18 

11  .1235e-07  .2000e-08  .10000e-00  13 

12  .1435e-07  .2000e-08  .10000e-00  14 

13  .1635e-07  .2000e-08  .10000e-00  9 

14  ,1835e-07  .2000e-08  .10000e-00  9 

15  ,2035e-07  .2000e-08  .10000e-00  7 

16  .2235e-07  .2000e-08  .10000e-00  7 

17  .2435e-07  .2000e-08  .10000e-00  9 

18  ,2635e-07  .2000e-08  .10000e-00  7 

19  .2835e-07  .2000e-08  .10000e-00  9 

20  ,3000e-07  .1650e-09  .10000e-00  9 


(*):  The  original  tine  step  can  not  converge  within  50  iterations, 
with  the  rednced  tine  step  being  nsed. 

The  coupling  is  the  conductance  of  the  enitter  resistance  re-10. 


Tab  la  l.,£4 

UL  Ittvetterj  ylj&  xfi-12. 
(SLATE) 


tiae 

tiao  atap 

#  of  ita 

0 

0 

0 

35 

1 

,5000a-10 

.S000a-10 

4 

2 

.1000 e-09 

.5000o-10 

4 

3 

.3000«-09 

.2000 e-09 

4 

4 

.11000-08 

.8000e-09 

7 

5 

.31000-08 

.20000-08 

7 

6 

.51000-08 

.20000-08 

5 

7 

.71000-08 

.20000-08 

7 

8 

.91000-08 

.20000-08 

12 

9 

.11100-07 

.20000-08 

7 

10 

.13100-07 

.20000-08 

13 

11 

.15100-07 

.20000-08 

11 

12 

.17100-07 

.20000-08 

10 

13 

.19100-07 

.20000-08 

15 

14 

.21100-07 

.20000-08 

6 

15 

.23100-07 

.20000-08 

8 

16 

.25100-07 

.20000-08 

6 

17 

.27100-07 

.2000e-08 

6 

18 

.2910O-07 

.20000-08 

6 

19 

.30000-07 

.90000-09 

6 

Table  5 .6e 

TTL  inverters  with  re«=l 
( SLATE- R) 


time 

tiae  step 

coupl ing 

#  of  iterations 

0 

0 

0 

.10000e-01 

65 

1 

,5000e-10 

,5000e-10 

.lOOOOo+Ol 

5 

2 

.1000 e-09 

.5000 e-10 

.lOOOOo+Ol 

6 

3 

.30009-09 

.2000 o-09 

.10000e+01 

6 

4 

.1100e-08 

.8000e-09 

.10000e+01 

7 

5 

.3100 e-08 

,2000e-08 

.lOOOOo+Ol 

7 

6 

.5100 e-08 

.2 000 e-08 

.lOOOOo+Ol 

7 

7 

.7100 e-08 

.20000-08 

.lOOOOo+Ol 

8 

8 

.7350e-08 

.2500O-09 

.10000 e+01 

12  (•) 

9 

.83 50e-07 

.1000 e-08 

.10000 o+Ol 

36 

10 

.1035e-07 

.2000O-08 

.10000e+01 

19 

11 

.1235e-07 

.2000e-08 

.10000 e+01 

9 

12 

.1435e-07 

.2000O-08 

.10000e+01 

7 

13 

.1635e-07 

.2000 e-08 

.10000 e+01 

12 

14 

.1835e-07 

.2000 e-08 

.10000e+01 

13 

15 

.2035e-07 

,2000e-08 

.lOOOOo+Ol 

10 

16 

.2235e-07 

.2000e-08 

•lOOOOe+Ol 

11 

17 

.24350-07 

.2000e-08 

.10000e+01 

11 

18 

.2635e-07 

.2000e-08 

.10000e+01 

14 

19 

.2835e-07 

,2000e-08 

.10000 e+01 

11 

20 

.3000O-07 

.1650e-09 

.10000 e+01 

12 

(*):  The  original  tiae  step  can  not  converge  within  SO  iterations. 


with  the  reduced  time  step  being  used. 


The  coupling  is  the  conductance  of  the  emitter  resistance  re=l 


Table  £.6f 


tine 

0 

.5000e-10 
.1000e-09 
.30006-09 
.1100e-08 
.3100 e-08 
.5100e-08 
.7100e-08 
.9100e-08 
.1110e-07 
.1310e-07 
.1310e-07 
.1710e-07 
.19106-07 
.2110e-07 
.23106-07 
.25106-07 
.27106-07 
.29106-07 
.30006-07 


TTL  inverters  with  re»l 
( SLATE) 


tine  step  #  of  iterations 


0  40 

.SOOOe-lO  4 

.5000 e-10  4 

.20006-09  4 

.80006-09  7 

.2000e-08  7 

.2000e-08  5 

.2000e-08  7 

.20006-08  12 

.20006-08  10 

.20006-08  12 

.2000e-08  6 

.2000e-08  10 

.20006-08  15 

.20006-08  9 

.20006-08  10 

.20006-08  8 

.20006-08  6 

.20006-08  8 

.90006-09  5 


T.rtlfi  5.7a 

ECL  inverters  with  rb«100 
( SLATE- R) 


time 

time  step 

0 

0 

0 

1 

.7812e-12 

.7812e-12 

2 

.1562e-ll 

.7812e-12 

3 

.1953 e-11 

,3906e-12 

4 

•3615e-ll 

.1562e-ll 

5 

.4297 e-11 

.7812e-12 

6 

.46 87 e-11 

.3906  e-12 

7 

.6250 e-11 

.1562e-ll 

8 

.7031e-ll 

.7812e-12 

9 

,7422e-ll 

.3906e-12 

10 

.8984 e-11 

,1562e-ll 

11 

•9766e-ll 

,7812e-12 

12 

,1016e-10 

.3906e-12 

13 

.1172e-10 

.  .1562e-ll 

14 

•1250e-10 

.7812e-12 

15 

,1289e-10 

.3906e-12 

16 

,1309e-10 

•  1953e-12 

17 

.1387 e-10 

.7812e-12 

18 

,1426e-10 

.3906e-12 

coupling  #  of  iterations 

.10000e-01  38 

.10000 e-01  4  (*•) 

.lOOOOe-Ol  4 

.lOOOOo-Ol  3  (•) 

.10000 e-01  3 

.10000 e-01  3  (•) 

.10000e-01  3  (•) 

.10000 e-01  3 

.10000e-01  3  (•) 

.10000 e-01  3  (•) 

.10000 e-01  3 

.10000e-01  3  (•) 

.10000e-01  3  (*) 

.10000 e-01  3 

.10000e-01  3  (•) 

.10000e-01  3  (•) 

.10000e-01  3  (•) 

.10000e-01  3 

.10000e-01  3 


(•*): 

The  original  tine  step  can  not  converge  within  100  iterations, 
with  the  reduced  tine  step  being  used. 

(*):  The  original  time  step  can  not  converge  within  50  iterations, 
with  the  reduced  time  step  being  used. 

The  coupling  is  the  conductance  of  the  base  resistance  rb=100. 


Table  5 .7b 

ECL  inverters  with  rb=100 
(SLATE) 

time  time  step  #  of  iterations 


0  0  12 

5000e-10  .5000e-10  3 

1000e-09  .5000e-10  3 

3000e-09  .2000 e-09  4 

1100e-09  .8000e-09  4 

2000e-08  ,9000e-09  4 

2050e-08  .5000e-10  4 

2100e-08  .5000 e-10  3 

2300e-08  .2000e-09  3 

3100 e-08  .8000 e-09  5 

5100 e-08  .2000 e-08  5 

7100e-08  .2000 e-08  4 


91 00 e-08  .2000e-08  4 
1110e-07  .2000e-08  3 
1310e-07  .2000e-08  3 
1510e-07  .2000e-08  3 
1710e-07  ,2000e-08  3 


1910e-07  .2000e-08  3 

2110e-07  .2000e-08  3 

2310e-07  .2000 e-08  3 

2510e-07  .2000 e-08  3 

2710e-07  .2000e-08  3 

2910e-07  .2000e-08  3 

3000e-07  .9000 e-09  3 


It  is  shown  in  Table  5.4  and  Table  5.5  that  the  simulation 


results  of  the  NMOS  circuits  and  OIOS  circuits  are  good  and  the  time 
steps  used  in  SLATE-R  are  identical  with  those  in  the  SLATE  program, 
because  of  the  low  coupling  terms  with  the  MDS  technologies.  OIOS 
circuits  take  slightly  more  iterations  than  NMOS  circuits  due  to  con¬ 
vergence,  because  of  the  higher  signal  gain  and  stronger  coupling 
capacitance  with  the  CMOS  technology.  From  Table  5.6  and  Table  5.7, 
it  is  found  that  for  the  bipolar  transistor  circuits,  the  number  of 
iterations  required  for  convergence  is  excessive  because  of  the 
increased  coupling  and  gain  of  the  input  circuits.  By  changing  the 
emitter  resistance  in  Table  5.6  to  adjust  the  coupling,  it  clearly 
shows  the  effect  of  the  coupling  on  the  convergence  speed.  A  limita¬ 
tion  on  the  number  of  iterations  per  time  point  was  set  in  the 
SLATE-R  program.  If  the  current  time  step  can  not  reach  convergence 
in  a  certain  number  of  iterations,  a  new  time  step  is  chosen.  Com¬ 
pared  with  the  TIL  circuits,  the  ECL  circuits  exhibit  shorter  gate 
propagation  delay  which  causes  the  simulation  of  the  ECL  circuit  to 
take  more  iterations  and  eventually  result  in  very  small  time  steps. 


CHAPTER  6 


THE  SLATE- R  PROGRAM 

With  the  advantages  mentioned  in  the  previous  chapters,  the 
SLATE  Program  has  been  chosen  to  be  the  pedestal  to  implement  the 
time-point  relaxation  techniques.  The  SLATE  Program  exploits  the 
node  tearing  method  which  basically  is  just  a  generalized  Norton 
equivalent  circuit  approach.  Each  subcircuit  can  be  extracted  as  an 
equivalent  circuit  via  the  LU  factorization.  By  clustering  all  the 
equivalent  subcircuits  to  construct  a  simplified  equivalent  circuit, 
which  is  the  interconnection  matrix  in  Eq.  (5.2),  one  can  solve  the 
interconnection  matrix  to  obtain  the  node  voltages  of  all  the  tearing 
nodes  for  each  subcircuit.  The  next  step  is  to  go  back  to  each  sub¬ 
circuit  with  the  tearing  node  voltages  as  the  stimuli  to  get  the  node 
voltage  of  the  internal  nodes.  From  the  viewpoint  of  the  matrix 
equations,  the  SLATE  program  is  still  inverting  the  whole  matrix;  all 
the  subcircuits  can  be  solved  simultaneously.  Vith  this  solution 
strategy,  one  need  not  worry  about  the  order  in  which  the  subcircuits 
are  to  be  processed,  and  the  fan-in  and  fan-out  nodes  do  not  have  to 
be  identified. 

However,  the  solution  strategy  of  the  relaxation  techniques  is 
to  process  each  subcircuit  individually  by  treating  the  fan-in  nodes 
as  the  sources  and  the  fan-out  nodes  as  inputs  to  the  next  stage  of 
subcircuits.  Hence,  one  should  identify  the  fan-in  nodes  and  fan-out 
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node*  of  each  snbcircnit  and  also  determine  the  order  in  which  the 


shbcircnits  are  to  be  processed. 


In  this  chapter,  we  will  describe  the  implementation  of  the  pro¬ 


gram  SLATE— R  (a  Simulator  with  Latency  and  Tearing  -  Relaxed  ver¬ 


sion)  .  Different  iterative  schemes  of  the  relaxation  approaches  are 


also  described. 


6.1  The  Input  Process ins 


READ IN  Overlap 


For  the  exploitation  of  the  relaxation  techniques,  the  fan-in 


nodes  and  the  fan-out  nodes  of  each  subcircuit  have  to  be  identified. 


However,  the  circuit  input  format  of  the  SLATE  (SPICE2)  program  only 


gives  the  external  nodes  of  each  subcircuit  without  specifying  the 


fan-in  nodes  and  fan-out  nodes.  The  first  step  of  the  program  imple¬ 


mentation  is  to  change  the  SUBCKT  card  in  the  input  file.  The  general 


form  of  the  subcircuit  definition  in  the  SLATE  program  is  as  follows: 


General  form 


.SOBCKT  SUB NAM  N1  <N2  N3  . . . NE> 


Examples: 


.SOBCXT  NAND2  10  20  30  40 


In  the  subcircuit  definition,  SUBNAM  is  the  subcircuit  name,  and 


N1 ,  N2,  ...  ,NE  are  the  external  nodes.  For  example,  the  NAND2  c i i 


cuit  has  four  external  nodes,  node  #10  is  the  DC  power  supply,  node 


term 


#20  and  node  #30  are  the  input  nodes  while  node  #40  is  the  ontpnt 
node.  In  order  to  identify  the  fan-in  nodes  and  the  fan-out  nodes, 
the  SUBCKT  card  is  aiodified  to  the  fom: 


General  fora 

. SOB GET  SUB NAM  N1  <N2  N3  ...  NE>  NFIN 

Examples: 

•  SUB (IT  NAND2  10  20  30  40  3 

where  NFIN  is  the  number  of  the  fan- in  nodes  of  the  subcircuit.  The 
DC  power  supplies  are  treated  as  the  fan-in  nodes  in  order  to  obtain 
the  optimal  reordering  [10]  performance.  In  this  case,  the  NAND2 
circuit  has  three  fan-in  nodes  and  one  fan-out  node,  so  NFIN  *  3. 

£.2  Analysis  Sequencing  -  ERRCHK  Overlay 

In  the  ERRCHK  overlay,  the  SLATE  program  constructs  the  node 
connection  lists.  This  primitive  and  the  event-driven  algorithm  in 
Section  5.3  are  used  to  set  up  the  order  in  which  the  subcircuits  are 
to  be  processed. 

£•2  Matrix  Setup  and  Matrix  Location  -  SgTPP  Overlay 

With  the  exploitation  of  the  node  tearing  approach,  the  SLATE 
program  reorders  all  the  tearing  nodes  to  the  border  as  shown  in  Eq. 
(5.1).  The  interconnection  matrix  shown  in  Eq.  (5.2)  is  the  core  in 
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the  solution  strategies  of  the  node  tearing  techniques.  Because  the 
relaxation  techniques  need  not  fomulate  the  interconnection  aatrix, 
each  subeircuit  can  be  treated  as  one  'independent"  circuit  with  its 
fan-ins  as  the  stiauli  input  Toltage  sources.  Hence,  the  aatrix  set 
up  is  straightforward.  The  repetitiveness  property  of  subcircuits  is 
still  used  in  the  SLATE- R  prograa,  i.e.,  only  one  subeircuit  descrip¬ 
tion  for  each  type  of  repetitive  subcircuit  need  be  stored  and  only 
one  set  of  the  subaatrix  sparse  aatrix  pointers  for  each  type  of 
repetitive  subeircuit  is  needed.  In  this  overlay,  a  set  of  fan-in 
and  fan-out  lists  are  established  to  trace  the  input  stiauli  and  the 
coupling  terns  aaong  each  subcircuit. 

6.±  Analysis  Procedure  -  DCTRAN  Overlay  la  SLATE  Proaran 

<•14  Algorithms  in  SLATE  Protran 

The  analysis  algorithms  used  in  SLATE  prograa  are  shown  below: 

initial ize; 

TIME  -  0 

call  SORE PD  to  set  sources  for  the  entire  circuit; 

call  ITER8; 

if  (not  converged)  stop  analysis; 

(  print  operating-point  solution; 

} 


savout:  store  outputs 


new tin:  TIME  -  TIME  +  DELTA 


if  (TIME  >  TSTOP  )  exit; 

(  adjust  DELTA  for  breakpoint  table  values; 
call  SOHDPD; 
call  ITEB8; 

} 

if  (converged)  goto  tsterr; 

{  TIME  =  TIME  -  DELTA; 

DELTA  -  DELTA/ 8; 
goto  tstdel; 

} 

tsterr:  call  ISI7NC; 

if  (error  acceptable)  goto  savout; 

{  TIME  *  TIME  -  DELTA; 

DELTA  “  DEL  NEW  (coaipnted  in  ISDNC) ; 

} 

tstdel:  if  (DELTA  <  D ELM IN  )  stop  analysis; 
goto  newtia; 

The  actual  Newton-Raphson  iteration  is  controlled  by  subroutine 
ITER8 .  A  flow  chart  for  that  subroutine  follows. 

£.4, .2  Iteration  Schene  in  SLATE  Program 


ITERNO  -  NON  00 V  -  0  ; 


DONE  ■  .false.; 


while  (not  done) 
yload:  (  call  1LOAD; 

if  ((NOSOLV  is  nonzero)  and 

(analysis  •  initial  transient))  exit; 
load  the  circuit  (except  snbcircnits) ; 
locx*locate(19) ;  (load  the  first  snbcircnit) 
snbckl:  if  (locx  ■  0)  goto  sdedcn  ; 
latency  check; 

(in  tine  level  or  Newton-Saphson  iteration  level) 
if  (nodplc(locx+9)  is  nonzero)  goto  nxtckl; 
load  elements  in  snbcircnit; 

nxtckl:  locx  *  nodplc(loex);  (search  for  the  next  snbcircnit) 
goto  snbckl; 

sdcdcm:  locx  *  locate (19); 
snbcks :  if  (locx  *  0)  exit; 

if  (uodplc( locx+9)  *  1)  (latency  in  tine  level)  goto  nxtcks; 
call  SDCDCM;  (LU  deconposition  for  snbcircnit) 
nxtcks:  locx  *  nodplc(locx);  goto  snbcks; 

} 

ITERNO  =  ITERNO  +  1 

if  (ITERNO  >  iteration  linit)  exit; 

if  (NON CON  «  0  )  DONE  =  .true.; 

{  call  DCDCMP; 

(LU  deconposition  for  the  interconnection  matrix) 


{  call  DCSCL;  (solve  the  interconnection  aatriz) 
if  (ell  the  tearing  nodes  converged)  NON CON  •  0; 

) 

locz  -  locate(19) 
solckt:  if  (locz  =  0)  goto  yload; 
check  latency  flag; 
if  ( (nodplc( locz+9)  is  nonzero  )  and 
(keep  latency))  goto  sdcsol; 

{  call  SDCSOL; 

(solve  the  subcircuit  with  backward  substitution) 
check  convergence; 

if  (not  converged)  NGN  CON  -  NON CON  +  1 

} 

sdcsol:  locz  *  nodplc(locz);  goto  solckt 

) 

Froa  the  algorithms  used  in  the  SLATE  program,  it  is  found  that 
with  the  iapleaentation  of  the  node  tearing  technique  the  SLATE  pro¬ 
gram  can  take  advantage  of  the  latency  ezploitation.  However,  the 
node  tearing  approach  used  here  is  one  special  reordering  technique 
which  puts  all  the  tearing  nodes  to  the  border  of  the  system  matrix. 
By  solving  the  resulting  interconnection  aatriz  [8],  one  gets  the 
ezternal  node  voltages  for  each  subcircuit,  while  the  internal  node 
voltages  are  solved  by  backward  substitutions.  Froa  the  viewpoint  of 
solving  the  aatriz  equations  of  the  system,  full  aatriz  inversion  is 


still  needed  in  the  SLATE  program.  In  the  next  section,  ve  describe 
how  the  relsxstion  techniques  ere  implemented  in  the  DCilcAN  overlay 
end  the  core  overlay  of  the  SLATE  program. 


Because  each  time  only  one  snbcircuit  is  analyzed  with  the  use 
of  the  relaxation  techniques,  ve  just  need  to  process  the  individual 
siatrix  equations  for  the  corresponding  subcircuit.  This  is  a  depar¬ 
ture  froa  the  analysis  procedure  with  the  SLATE  program.  The 
analysis  algorithms  used  in  the  SLATE-R  program  are  shown  below: 


6.5.1  Aifor<th««  in  SLATE- B  Proaram 


initialize; 

TIME  -  0 
call  ITER8; 

if  (not  converged)  stop  analysis; 

{  print  operating-point  solution; 

} 

savout:  store  outputs 
newtia:  TIME  «  TIME  +  DELTA 

if  (TIME  >  TSTOP  )  exit; 

(  adjust  DELTA  for  breakpoint  table  values; 


call  ITERS; 


if  (converged)  goto  tsterr; 
(  TIME  -  TIME  -  DELTA; 


DELTA  -  DELTA/8; 
goto  tstdel; 

} 

) 

tsterr:  cell  TBUNC; 

if  (error  acceptable)  goto  savont; 

(  TIME  -  TIME  -  DELTA; 

DELTA  -  DELNEW  (computed  in  UOJNC) ; 

} 

tstdel:  if  (DELTA  <  DELMIN  )  stop  analysis; 
goto  newtim; 

Two  different  iteration  schemes  are  implemented.  In  order  to 
describe  the  iteration  schemes,  let  us  recall  Eq.  (3.6). 

•  / _n+l  ,n+l  n+1  n  _n»  _  « 

gk(xl  '  x2  '  •••»  xk-l*  k'  xk+l'  xm}  0 

At  time  t^^,  the  k-th  component  of  X11**.  x°  *  i*  obtained  by  solving 
the  above  scalar  equation.  In  our  implementation,  one  iteration 
scheme  is  that  each  time  we  only  iterate  one  subcircuit  once  and  go 
to  the  next  subcircuit  and  continue  this  process  until  all  the  sub¬ 
circuits  are  converged.  The  other  scheme  is  to  iterate  each  subcir¬ 
cuit  until  convergence  is  reached  and  then  go  to  the  next  subcir¬ 
cuit.  After  all  the  subcircuits  are  processed,  then  go  back  to  the 


first  subcircuit  to  check  if  the  iteration  has  converged.  This  pro¬ 
cess  proceeds  until  all  the  sobcironits  are  checked.  The  flow  chart 
for  the  implementation  of  the  first  iteration  scheme  follows. 


<5.j5. .2  Iteration  Scheme  I  _ip  SLATE- R  Prosram 

ITERNO  -  0  ; 

DONE  -  .false.; 
while  (not  done) 
icheck  *  0  (convergence  flag); 
iterat:  locz  «  locate(19) 
subckl:  if  (locz  -  0)  goto  ncheck; 

(  call  SORDID; 

(to  set  sources  for  this  subcircuit) 

} 

(  call  YLOAD; 

(to  load  elements  for  this  subcircuit) 
if  ((NOSOLV  is  nonzero)  and 

(analysis  “  initial  transient))  ezit; 
latency  check; 

(in  time  level  or  Newton- Raphson  iteration  level) 
if  (nodplc( locz+9)  is  nonzero)  ezit; 

{  load  elements  in  subcircuit; 


call  SDCDCM; 

(LU  decomposition  for  this  subcircuit) 
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} 

} 

if  (nodplc(loex+9)  it  nonzero)  goto  nxtckl; 

{  cell  SDCSCL; 

(solve  the  snboircnit  with  beckwerd  substitution) 

NON  CON  -  0; 

check  convergence;  (for  this  subcircuit  only) 
if  (ell  the  nodes  of  this  subcircuit  converged)  NON 00N  * 
{  NON  CON  -  NON  CON  +  1 
) 

if  (NON CON  “  0)  goto  nxtckl; 

{  i check  ■  icheck  +  1 

} 

} 

nxtckl:  locx  «  nodplc(locx) ; 

(search  for  the  next  subcircuit) 
goto  subckl; 

ncheck:  check  convergence;  (for  the  entire  circuit) 
if  (icheck  *  0)  DONE  ■  .true.; 

(  ITERNO  -  ITERNO  +  1 

if  (ITERNO  >  iteration  limit)  exit; 
icheck  *  0; 
goto  iterat 


} 


/ 


h  "jm'Ri  *nr  mr  ^  u  r 


The  following  is  the  flow  chert  for  the  implementation  of  the  second 
iteration  scheme: 


£.£.1  Iteration  Scheme  II  in  SLATE- R  Progrs 


' FERNO  ■  0  ; 
DONE  »  .false. ; 


while  (not  done) 


icheck  *  0  (global  convergence  flag); 
ichckt  ■  0  (local  convergence  flag); 
iterat:  locx  *  locate(19) 
snbckl:  if  (locx  ■  0)  goto  ncheck; 


ichckt  «  0 


{  call  SORUPD; 

(to  set  sources  for  this  subcircuit) 


yload:  {  call  YLOAD;  (to  load  elements  for  this  subcircuit) 

if  (ichckt  is  nonzero)  goto  yload2; 
if  ( ( NOSCLV  is  nonzero)  and 

(analysis  =  initial  transient))  exit; 
latency  check; 

(in  time  level  or  Newton-Raphson  iteration  level) 
if  (nodplc(locx+9)  is  nonzero)  exit; 
yload2:  load  elements  in  subcircuit; 

(  call  SDCDCM  (LD  decomposition  for  this  subcircuit); 


•>V.VAV; 
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} 

if  (nodplc(locx+9)  is  nonzero)  goto  nxtckl; 

{  call  SDCSOL; 

(solve  the  subcircuit  with  backward  substitution) 

NON CON  =  0; 

check  convergence;  (for  this  snbcircnit  only) 
if  (all  the  nodes  of  this  snbcircnit  converged)  NONCON  =  0; 
(  NONCON  -  NONCON  +  1 

if  (NONCON  =  0)  goto  nxtckl; 

{  ichckt  *  ichckt  +  1 
icheck  =  icheck  +  1 
goto  yload; 

) 

} 

nxtckl:  locx  ”  nodplc(locx) ;  (search  for  the  next  snbcircnit) 
goto  snbckl; 

ncheck:  check  convergence;  (for  the  entire  circuit) 
if  (icheck  =  0)  DONE  =  .true.; 

{  ITERNO  *  ITERNO  +  1 

if  (ITERNO  >  iteration  limit)  exit; 
goto  iterat 

} 


The  experimental  results  for  different  iteration  schemes  are 
given  in  the  next  section. 
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Table  6.2  shows  the  results  for  the  binary-to-octal  decoder  ci 
cuit  in  Fig.  5 .3 . 


Table  6.2 


Simulation  data  for  the  binary  to- octal  decoder  circuit 

DC  Analysis 


CPU  (sec) 

Scheme  I 

Scheme  II 

SLATE 

11.87 

7.85 

6.27 

Transient  Analysis 


CPU  (sec) 


Scheme  I 


Scheme  II 


SLATE 


Table  6.3  shows  the  results  for  the  one-bit  full  adder  circuit  in 
Fig.  5.4. 


Table  6.3 

Simulation  data  for  the  one-bit  foil  adder  circuit 


DC  Analysis 


CPU  (sec) 

Scheme  I 

Scheme  II 

SLATE 

6.18 

5.25 

4.67 

Transient  Analysis 


CPU  (sec) 

Scheme  I 

Scheme  II 

SLATE 

Table  6.4  shove  the  resalts  for  the  bipolar  transistor  cascade  of 
inverters  in  Fig.  5.7. 


Table 

Slmnlation  data  for  the  bipolar  transistor  inverters 

DC  Analysis 


(sec) 


Scheme  I 

Scheme  II 

SLATE 

2.62 

3.52 

2.12 

Transient  Analvsi 


CPU  (sec) 

Scheme  I 

Scheme  II 

SLATE 

14.45 

33.07 

8.42 

6.2  Concladina  Remarks 
6.1.1  Remark  J 

From  Table  6.1,  Table  6.2  and  Table  6.3,  it  is  fonnd  that  for 
the  MDS  digital  circuits.  Scheme  II  vorks  faster  than  Scheme  I  does 
in  the  DC  analysis,  while  in  the  transient  analysis.  Scheme  II  is 
slower  than  Scheme  I.  For  example,  for  the  11-stage  inverter  chains, 
in  the  DC  analysis  Scheme  II  takes  3.20  seconds  and  Scheme  I  takes 
7.23  seconds,  whereas  in  the  transient  analysis  Scheme  II  takes  52.92 


seconds  and  Scheme  I  takes  39.72  seconds.  On  the  other  hand,  from 
Table  6.4.  for  the  bipolar  digital  circuit,  in  the  DC  analysis. 
Scheme  II  takes  3.52  seconds  and  Scheme  I  takes  2.62  seconds,  while 
in  the  transient  analysis.  Scheme  II  takes  33.07  seconds  and  Scheme  I 
takes  14.45  seconds.  In  other  words.  Scheme  II  is  always  slower  than 
Scheme  I  in  the  simulation  of  the  bipolar  digital  circuits. 

Because  there  are  no  coupling  capacitors  for  the  NOS  digital 
circuits  in  the  DC  analysis,  in  Scheme  II  one  subcircuit  is  processed 
to  convergence  and  then  fed  to  the  next  subcircuit  which  obtains  the 
exact  stimuli  to  process  the  analysis,  hence,  there  is  no  waste  in 
the  iterative  process.  For  Scheme  I  the  waveforms  fed  from  the  fan- 
in  subcircuit  into  the  next  stage  are  not  accurate  until  the  fan-in 
subcircuit  achieves  convergence;  before  that,  all  the  iterative 
processes  do  not  make  any  sense. 

However,  in  the  transient  analysis,  from  Table  5.5  it  is  found 
that  the  typical  number  of  iterations  needed  to  achieve  convergence 
for  NOS  digital  circuits  is  three  or  four  at  each  time  point.  Let  us 
define  one  unit  as  one  iteration  for  one  subcircuit.  If  we  take  a 
four  cascade  chain  of  uniform  subcircuits  as  an  example,  then  Scheme 
I  takes  four  iteration  sweeps  to  reach  convergence  with  each  sweep 
analyzing  four  subcircuits;  then  the  total  nnits  needed  for  Scheme  I 
are  16.  If  Scheme  II  is  used,  it  takes  four  iterations  for  each  sub¬ 
circuit  to  reach  local  convergence,  that  means  in  the  first  run  the 
total  units  are  16  and  it  takes  at  least  two  runs  to  ensure  global 
convergence,  that  is,  32  units. 


The  strong  coupling  tens  in  the  bipolar  technology  make  the 
relaxation  technique  take  more  iterations  to  achieve  convergence  as 
shown  in  Table  5.6.  The  typical  number  of  iterations  needed  to 
achieve  convergence  for  bipolar  transistor  circuits  is  six  to  eight 
at  each  time  point.  Similarly,  let  us  take  the  four  cascade  chain  of 
uni form  snbcircuits  as  an  example.  For  Scheme  I  it  takes  six  sweeps 
to  reach  convergence,  that  is,  24  units.  While  Scheme  II  takes  six 
iterations  to  achieve  local  convergence,  each  run  needs  24  itera¬ 
tions.  Scheme  II  needs  at  least  two  runs  to  ensure  global  conver¬ 
gence,  that  is,  48  units. 

Our  conclusion  is  that,  if  there  is  no  coupling,  then  Scheme  II 
which  is  used  in  MDTIS-C  and  PKEMOS  works  very  well  just  like  MOTIS-C 
and  PREMOS.  When  there  is  coupling,  then  Scheme  I  works  faster  than 
Scheme  II  does. 

6.1.2  Remark  II 

In  order  to  see  the  performance  of  the  SLATE-K  program  in  the 
simulation  of  strongly  connected  circuits.  Table  6.5  and  Table  6.6 
show  the  results  for  the  3-stage  ring  oscillator  in  Fig.  4.1  and  the 
SR  Flip  Flop  in  Fig.  6.1.  Only  Scheme  I  is  used. 


Table  6.5  above  tbe  results  for  the  3-stage  ring  oscillator  in 
Fig.  4.1. 

Table  £.£ 

Simulation  data  for  the  3-stage  ring  oscillator 
DC  Analysis 
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Tabl •  6.6  show*  the  results  for  the  SK  Flip  Flop  in  Fig.  6.1. 

Tibia  £•£ 

Simulation  data  for  th»  _gR  Flip  Flop. 

PC  Annlyala 


CPU  (see) 

Scheme  I 

SLATE 

3.13 

2.05 

Transient  Analysis 


a 


s 


i 


136 


ct 

Table  6.8  shows  the  results  for  the  one-bit  full  adder  circuit  in 
Fig.  5.4  without  the  latency  exploitation. 

Table  £  .3  ^ 


Transient  Analysis 


CPU  (see) 

#  of  iterations 

SLATE-R 

41 .45 

84 

SLATE 

17.97 

46 

It  is  found  that,  for  DC  analysis,  SLATE— R  and  SLATE  take  the 
sane  number  of  iterations.  In  Table  6 .7 ,  the  CPU  tine  for  SLATE— R  is 
7.25  seconds  and  is  144.14%  of  the  5.03  seconds  CPU  tine  for  SLATE. 
Thus,  we  can  estimate  the  overhead  for  the  relaxation  nethod  to 
predict  the  coupling  is  44.14%.  Similarly,  in  Table  6.8,  the  over¬ 
head  to  predict  the  coupling  is  35.65%. 

In  transient  analysis.  Table  6.7  shows  the  SLATE-R  takes  84 
iterations  in  41.45  seconds  of  CPU  time,  while  SLATE  takes  only  46 
iterations  in  17.97  seconds.  The  CPU  time  spent  in  SLATE-R  is  230.66% 
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of  that  spent  in  SLATE;  the  iteration  number  needed  in  SLATE-R  is 
182.61%  of  that  needed  in  SLATE.  The  overhead  to  predict  the  cou¬ 
pling  is  48.05%.  In  Table  6.8,  the  CPU  tine  spent  in  SLATE-R  is 
310.12%  of  that  spent  in  SLATE,  and  the  iteration  number  needed  in 
SLATE-R  is  274.26%  of  that  needed  in  SLATE.  The  overhead  to  predict 
the  coupling  is  35.86%. 

Fran  Table  6.7  and  Table  6.8,  ve  can  determine  the  dominant  fac¬ 
tor  which  makes  the  difference  in  simulation  speed.  As  we  can  see 
more  iterations  are  required  for  the  relaxation  method,  and  the  over¬ 
head  for  the  predictor  is  also  a  considerable  factor. 


CHAPTER  7 


m 


CONCLUSIONS 


We  used  a  siaple  linear  circuit  model  to  represent  one  gate 
driving  another  gate,  and  ve  investigated  the  numerical  stability  and 
convergence  properties  of  the  time-point  (Gauss- Seidel)  and  waveform 
relaxation  methods  as  a  function  of  the  amount  of  parasitic  coupling 
capacitance  between  the  gates  and  the  gain  of  the  driven  gate  in  the 
active  region. 

In  Chapter  3  our  investigation  shows  that  the  modified  Gauss- 
Seidel  time-point  relaxation  method  does  not  behave  well  for  linear 
stiff  systems  if  the  stiffness  is  caused  by  the  coupling.  The  method 
becomes  numerically  unstable  when  the  size  of  the  time  step  exceeds 
the  smallest  time  constant  in  the  system  by  a  factor  of  three  or 
more. 
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In  Chapter  4  we  apply  the  modified  waveform  relaxation  method 
for  the  analysis  of  linear  stiff  systems.  It  is  found  that  the  itera¬ 
tions  oscillate  about  the  solution  and  an  excessive  number  of  itera¬ 
tions  are  required  for  convergence  unless  the  time  windows  are 
approximately  the  size  of  the  smallest  time  constant  of  the  system. 
These  results  clearly  indicate  that  if  the  system  stiffness  (strong 
Hiller  effect)  is  caused  by  the  coupling,  then  the  integration  step 
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size  cannot  exceed  approximately  the  smallest  time  constant  in  the 
circuit  in  order  to  have  good  convergence  properties. 
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However,  practical  implementation  shows  that  both  the  waveform 
relaxation  method  and  the  modified  Ganss-Seidel  method  are  not 
affected  very  much  by  the  pole-splitting  phenomenon  in  the  simulation 
of  NOS  digital  circuits.  This  seems  to  be  due  to  the  low  gain  and 
weak  coupling  in  these  circuits.  Also,  digital  NOS  circuits  are 
switched  on  and  off  fairly  rapidly  so  that  they  are  in  the  linear 
active  region  only  a  small  percentage  of  the  time  interval. 

In  SLATE  the  time  steps  for  transient  analysis  are  controlled  by 
the  local  truncation  error  only.  However,  because  of  the  poorer  sta¬ 
bility  convergence  properties  of  the  relaxation  methods  for  some 
problems,  the  time  step  is  controlled  first  by  the  local  truncation 
erros,  but  if  the  iteration  limit  is  exceeded,  the  time  step  is 
reduced  until  good  convergence  is  achieved.  Thus,  the  SLATE-R  pro¬ 
gram  may  not  only  require  more  iterations  per  time  point,  but  also 
more  time  points  may  be  required  to  obtain  the  solution  in  a  given 
time  interval. 

It  is  shown  in  Table  5.4a  and  Table  5.4b,  Table  5.5a  and  Table 
5.5b  that  the  time  steps  used  in  SLATE-R  to  analyze  the  NOS  digi¬ 
tal  circuit  examples  in  this  thesis  are  identical  with  those  in  the 
original  SLATE  program.  This  means  the  time  step  is  still  controlled 
by  the  local  truncation  error  in  SLATE-R  for  these  examples.  However, 
for  bipolar  technology,  our  results  show  that  it  takes  a  tremendous 
number  of  iterations  to  achieve  convergence.  In  some  cases,  the 
number  of  iterations  exceed  the  iteration  limit  which  forces  the  time 


step  to  be  reduced  and  eventually  causes  the  error  message  of  "inter- 
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nal  time  step  too  small"  to  stop  the  execution.  This  means  that  when 
the  coupling  between  subcircuits  is  too  strong,  the  time  steps  are 
controlled  by  the  iteration  count  instead  of  the  local  truncation 
error.  Also,  occasionally  the  wide  oscillation  between  successive 
iterations  causes  the  arithmetic  operations  to  overflow  before  the 
iteration  limit  is  reached,  because  of  the  strong  coupling  (terminal 
resistances)  in  the  bipolar  circuits.  In  Chapter  5,  we  used  different 
terminal  resistances  for  the  bipolar  TIL  circuits  and  verified  that 
the  strong  coupling  causes  the  convergence  problem. 

Event  driven  techniques  and  latency  exploitation  are  implemented 
in  SLATE- R.  Chapter  5  illustrates  the  effects  of  latency  exploita¬ 
tion  on  savings  of  CPU  time.  The  program  structure  is  shown  in 
Chapter  6.  Two  iteration  schemes  are  discussed.  It  is  found  that  the 
performance  of  different  iteration  schemes  are  coupling-oriented.  For 
the  cases  of  no  coupling  and  strong  coupling.  Scheme  II  works  better 
than  Scheme  I,  while  for  weak  coupling,  such  as  the  floating  capaci¬ 
tor  in  MOS  circuits.  Scheme  I  is  preferred.  For  very  strong  coupling, 
both  Scheme  I  and  Scheme  II  have  poor  numerical  properties. 

With  the  relaxation  technique,  there  is  no  need  to  formulate  the 
interconnection  matrix  like  in  the  SLATE  program.  In  the  SLATE-R  pro¬ 
gram,  each  subcircuit  can  be  processed  individually,  which  is  very 
suitable  for  a  multiprocessor  computer  system.  Currently,  we  use  the 
same  time  steps  for  all  the  subcircuits.  Some  advantage  can  be 
obtained  if  each  subcircuit  uses  its  own  time  step.  However,  the 
overhead  for  data  management  in  a  single  processor  system  might 
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offset  the  algorithms tic  advantage.  The  latency  exploitation  is  nsed 
as  a  compensation  for  choosing  the  same  time  step  for  all  the  subcir¬ 
cuits.  The  major  advantage  of  each  subcircuit  using  its  own  time 
steps  is  to  save  CPU  time  vith  the  penalty  of  data  management;  in 
this  case,  the  latency  exploitation  scheme  skips  those  latent  subcir¬ 
cuits  to  save  CPU  time. 

In  this  research,  event-driven  algorithms  are  implemented  which 
sequence  the  subcircuits  to  be  simulated  on  the  basis  of  fan-in  fan¬ 
out  topology.  Because  the  linked  list  data  structure  is  being  used 

in  SLATE,  the  coupling  terms  are  easily  traced  from  the  matrix 

pointer.  Tith  the  exploitation  of  the  modified  Gauss-Soidel  method, 
the  coupling  terms  on  each  subcircuit  are  decoupled  by  using  a  pre¬ 
dictor,  such  that  the  subcircuits  can  be  solved  one  at  a  time. 

It  is  found  that  SLATE-R  is  about  a  factor  of  two  slower  than 
SLATE.  There  are  three  factors  that  slow  down  the  simulation  speed  : 
the  first  is  the  overhead  of  using  predictor  to  decouple  the  coupling 
terms;  the  second  is  the  larger  number  of  iterations  needed  to  get 
the  convergence;  and  the  third  is  that  the  tearing  nodes  and  the 

power  supply  nodes  are  split  for  each  subcircuit  which  induces  a 

larger  size  of  submatrix.  In  Section  5.2,  the  submatrix  size  for  each 
subcircuit  is  3x3;  however,  with  the  split  of  power  supply  nodes  and 
the  tearing  nodes,  the  submatrix  size  is  increased  to  7x7,  thus  more 
CPU  time  is  required  for  each  subcircuit  factorization.  For  future 
research,  in  order  to  speed  up  the  simulation  speed  of  SLATE-R,  one 

can  change  the  data  structure  such  that  all  the  voltage  sources  are 


set  in  the  first  diagonal  block  exactly  as  Equation  (5.4)  shows, 
rather  than  splitting  the  voltage  source  nodes,  and  each  snbcircnit 
is  solved  with  the  snbmatrix  which  contains  only  the  unknown  vari¬ 
ables.  The  other  approach  is  to  use  individual  time  steps  for  each 
subcircuit.  Currently  SLATE-R  and  SLATE  both  use  the  same  time  steps 
for  all  the  subcircuits  and  they  are  all  solved  with  the  smallest 
time  step.  In  SLATE,  all  the  subcircuits  are  processed  through  LU 
factorization  as  well  as  the  interconnection  matrix.  Thus,  in  using  a 
multiprocessor  system  to  solve  the  equations,  the  LU  factorization  of 
the  interconnection  matrix  could  create  a  bottleneck.  However,  the 
solution  strategy  of  the  relaxation  technique  is  to  solve  one  subcir¬ 
cuit  at  a  time  with  the  coupling  terms  relaxed;  therefore,  it  seems 
to  have  more  potential  for  implementation  in  a  multiprocessor  cosr- 
puter  system. 

Vith  the  potential  of  taking  full  advantage  of  both  the  tearing 
technique  and  the  relaxation  technique,  the  implementation  of  the 
relaxation  approach  to  multiprocessor  systems  will  be  a  promising 
topic  for  future  investigation.  In  order  to  implement  the  relaxation 
technique  in  the  multiprocessor  computer  system,  an  automatic  parti¬ 
tion  algorithm  that  separates  the  system  into  certain  subsystems  is 
very  important.  The  basic  idea  will  be  to  choose  terminals  with  a 
minimum  fan-out  as  the  partitions.  More  specifically,  choose  those 
possible  nodes, such  as  the  gates  in  MOS  circuits, as  the  set  of  candi¬ 
dates  for  partitioning,  and  try  to  partition  this  set  into  certain 
subsets  with  about  the  same  size  so  as  not  to  cause  any  bottleneck  in 
the  parallel  process. 
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